{"id":17370000,"url":"https://github.com/pykong/merlin","last_synced_at":"2026-04-17T11:33:38.458Z","repository":{"id":195367924,"uuid":"637082931","full_name":"pykong/MERLIn","owner":"pykong","description":"Modular Extensible Reinforcement Learning Interface","archived":false,"fork":false,"pushed_at":"2023-10-11T08:43:06.000Z","size":4305,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-12T19:56:52.095Z","etag":null,"topics":["assignment","deep-q-learning","gym","iubh","pytorch","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pykong.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-05-06T13:03:23.000Z","updated_at":"2023-11-26T18:21:12.000Z","dependencies_parsed_at":"2024-12-06T15:43:10.784Z","dependency_job_id":"d00c34e7-1ace-4529-8f4a-7229b1757cb8","html_url":"https://github.com/pykong/MERLIn","commit_stats":null,"previous_names":["pykong/merlin"],"tags_count":3,"template":false,"template_full_name":"pykong/py-template-repo","purl":"pkg:github/pykong/MERLIn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FMERLIn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FMERLIn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FMERLIn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FMERLIn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pykong","download_url":"https://codeload.github.com/pykong/MERLIn/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FMERLIn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31927964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-17T10:35:34.458Z","status":"ssl_error","status_checked_at":"2026-04-17T10:35:09.472Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assignment","deep-q-learning","gym","iubh","pytorch","reinforcement-learning"],"created_at":"2024-10-16T00:23:04.337Z","updated_at":"2026-04-17T11:33:38.441Z","avatar_url":"https://github.com/pykong.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003ca href=\"#readme\"\u003e\n        \u003cimg alt=\"MERLIn logo\" src=\"https://raw.githubusercontent.com/pykong/merlin/main/docs/logo.svg\"\u003e\n        \u003c!-- Logo credits: Benjamin Felder --\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/PyVersion/3.11/purple\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Code-Quality/A+/green\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Black/OK/green\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Coverage/0.0/gray\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/MyPy/78.0/blue\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Docs/0.0/gray\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/pykong/merlin/main/LICENSE\"\u003e\u003cimg alt=\"License\" src=\"https://badgen.net/static/license/MIT/blue\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Build/1.0.0/pink\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/stars/★★★★★/yellow\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#readme\"\u003e\n        \u003cimg alt=\"MERLIn training GIF\" src=\"https://github.com/pykong/merlin/blob/main/docs/merlin_train.gif?raw=true\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n# MERLIn\n\nMERLIn short for `modular extensible reinforcement learning interface,` allows to easily define and run reinforcement learning experiments on top of [`PyTorch`](https://github.com/pytorch/pytorch) and [`Gym`](https://github.com/openai/gym).\n\nThis project started as a homework assignment for a reinforcement learning module from my Master's studies.\nI made it public, hoping you find it useful or interesting.\n\n## Usage\n\n### 0. Install\n\nMERLIn uses [`poetry`](https://python-poetry.org/) for dependency management.\nTo install all dependencies, run:\n\n```sh\npoetry install\n```\n\n### 1. Configure experiments\n\nExperiments can be defined as [YAML](https://learnxinyminutes.com/docs/yaml/) files merged with the default\nconfiguration before being passed into the main training loop. Parameters are\nidentical to the attributes of the `Config` class, and a table of [all parameters](https://github.com/pykong/merlin/tree/polish#training-parameters) is\ngiven further down.\n\nExample:\n\n`experiments/experiment_one.yaml`\n\n```yaml\n---\nmax_episodes: 1000\nagent_name: dueling_dqn\nalpha: 0.05\n```\n\nThis will train the agent `dueling_dqn` for 1000 episodes at a learning rate\nalpha of 0.5, while all other parameters will fall back to their default values\nas defined in the `Config` class.\n\n#### Nested element definitions\n\nUsing the `variants` array, different flavors of the same base configuration can\nbe defined as objects in that array. The deeper nested parameter will overwrite those\nhigher up. Variants can be nested.\n\n##### variants Example\n\n```yaml\n---\nmax_episodes: 1000\nvariants:\n  - {}\n  - alpha: 0.01243\n  - max_episodes: 333\n    variants:\n      - gamma: 0.5\n        memory_size: 99000\n      - batch_size: 64\n```\n\nThe above configuration defines the following experiments:\n\n1. `max_episodes: 1000`\n2. `max_episodes: 1000` and `alpha: 0.01243`\n3. `max_episodes: 333`, `gamma: 0.5` and `memory_size: 99000`\n4. `max_episodes: 333`, and `batch_size: 64`\n\n### 2. Start training\n\nAfter defining at least one experiment as described in the previous section, start training by simply invoking the following command:\n\n`poetry run train`\n\n#### Training in the background\n\nTo start training in the background, to allow training to proceed beyond the shell session, run the following script:\n\n`./scripts/traing_bg.sh`\n\nThe script will also watch the generated log statements to provide continuous console\noutput.\n\n### 3. Results\n\n#### Console output\n\nDuring training, the following outputs are continuously logged to the console:\n\n1. episode index\n2. epsilon\n3. reward\n4. train loss\n5. episode steps\n6. total episode time\n\nSpecial events like model saving or video recording will also be logged if they\noccur.\n\n#### File output\n\nEach experiment will generate a subfolder in the `results/` directory. Within\nthat subfolder, the following files will be placed:\n\n1. `experiment.yaml`: The exact parameters the experiment was run with\n2. A log holding the training logs, as printed out to the console (see section\n   before)\n3. Model checkpoints.\n4. Video files of selected episode runs.\n5. Images of the preprocessed state (optional).\n\n### Statistical Analysis\n\nMERLIn will automatically conduct some crude statistical analysis of the experimental results post-training.\nYou can manually trigger the analysis by running: `poetry run analyze \u003cpath/to/experiment/results\u003e`.\nAnalysis results will be written to a subfolder of the results directory `analysis/`.\n\n#### Summarization\n\nAs of `v1.0.0`, the last 2,000 episodes (as a hard-coded assumption of plateauing) are used to compare different algorithms.\nThe statistical analysis will aggregate all runs of each variant and calculate the following:\n\n- mean reward\n- std reward\n- lower bound of the confidence interval for mean reward\n- mean steps\n- std steps\n\n#### Plottings\n\nLine plots of rewards over episodes and histograms showing the reward distribution of all variants are produced.\n\n\u003cp float=\"left\"\u003e\n  \u003cimg alt=\"MERLIn logo\" src=\"https://raw.githubusercontent.com/pykong/merlin/main/docs/reward.svg\" width=\"49%\" /\u003e\n  \u003cimg alt=\"MERLIn logo\" src=\"https://raw.githubusercontent.com/pykong/merlin/main/docs/reward_dist.svg\" width=\"45%\"/\u003e\n\u003c/p\u003e\n\n### Training Parameters\n\nBelow is an overview of the parameters to configure experiments.\n\n| Parameter Name               | Description                                                                                      | Optional | Default      |\n|------------------------------|--------------------------------------------------------------------------------------------------|----------|--------------|\n| experiment                   | Unique id of the experiment.                                                                     | No       |              |\n| variant                      | Unique id of the variant of an experiment.                                                       | No       |              |\n| run                          | Unique id of the run of a variant.                                                               | Yes      | 0            |\n| run_count                    | The number of independent runs of an experiment.                                                 | Yes      | 3            |\n| env_name                     | The environment to be used.                                                                      | Yes      | 'pong'       |\n| frame_skip                   | The number of frames to skip per action.                                                         | Yes      | 4            |\n| input_dim                    | The input dimension of the model.                                                                | Yes      | 64           |\n| num_stacked_frames           | The number of frames to stack.                                                                   | Yes      | 4            |\n| step_penalty                 | Penalty given to the agent per step.                                                             | Yes      | 0.0          |\n| agent_name                   | The agent to be used.                                                                            | Yes      | 'double_dqn' |\n| net_name                     | The neural network to be used.                                                                   | Yes      | 'linear_deep_net' |\n| target_net_update_interval   | The number of steps after which the target network should be updated.                            | Yes      | 1024         |\n| episodes                     | The number of episodes to train for.                                                             | Yes      | 5000         |\n| alpha                        | The learning rate of the agent.                                                                  | Yes      | 5e-6         |\n| epsilon_decay_start          | The episode to start epsilon decay on.                                                           | Yes      | 1000         |\n| epsilon_step                 | The absolute value to decrease epsilon by per episode.                                           | Yes      | 1e-3         |\n| epsilon_min                  | The minimum epsilon value for epsilon-greedy exploration.                                        | Yes      | 0.1          |\n| gamma                        | The discount factor for future rewards.                                                          | Yes      | 0.99         |\n| memory_size                  | The size of the replay memory.                                                                   | Yes      | 500,000      |\n| batch_size                   | The batch size for learning.                                                                     | Yes      | 32           |\n| model_save_interval          | The number of steps after which the model should be saved. If None, model will be saved at the end of epoch only. | Yes | None           |\n| video_record_interval        | Steps between video recordings.                                                                  | Yes      | 2500         |\n| save_state_img               | Whether to take images during training.                                                          | Yes      | False        |\n| use_amp                      | Whether to use automatic mixed precision.                                                        | Yes      | True         |\n\n### Extending Agents, Environments, and Neural Networks\n\nMERLIn boasts itself of being modular and extensible, meaning you can quickly implement new agents, environments, and neural networks.\nSo that you know, all you need to extend said objects is to derive a new class from the respective abstract base class and register it at the regarding registry.\n\n#### Example: Implementing a new Neural Network\n\nCreate a new Python module, `app/nets/new_net.py`, holding a new class deriving from `BaseNet`.\nYou must provide a unique name via the name property.\n\n```py\nfrom app.nets._base_net import BaseNet\n\n\nclass NewNet(BaseNet):\n    @classmethod\n    @property\n    def name(cls) -\u003e str:\n        return \"new_net\"  # give it a unique name here\n\n    def _define_net(\n        self, state_shape: tuple[int, int, int], num_actions: int\n    ) -\u003e nn.Sequential:\n      # your PyTorch network definition goes here\n```\n\nAdd `NewNet` to the registry of neural networks in `app/nets/__init__.py`, to make it automatically available to the `make_net` factory function.\n\n```py\n\n...\n\nnet_registry = [\n    ...\n    NewNet,  # register here\n]\n\n...\n\n```\n\nThat's it. That simple. From now on, you can use the new network in your experiment definitions:\n\n```yaml\n---\nnet_name: new_net\n```\n\n### Scripts\n\nThe application comes with several bash scripts to help conduct certain\nfunctions.\n\n#### `check_cuda.sh` \u0026 `watch_gpu`\n\nPrint out information regarding the system's current CUDA installation and GPU usage for sanity-checking and troubleshooting.\n\n#### `install_atari.sh`\n\nInstalls the Atari ROMs used by `Gym` into the virtual environment.\n\n#### Sync scripts\n\nTypically, you want to offload the training workload to a cloud virtual machine. In\nIn this regard, `sync_up.sh` will upload sources and experiments to that machine.\nAfterward, the training results can be downloaded to your local system using\n`sync_down.sh`.\n\nA configuration-like connection data for both sync scripts is within the `sync.cfg` file.\n\n## Limitations\n\nThis project is now more of a didactic exercise rather than an attempt to topple\nestablished reinforcement learning frameworks such as [`RLlib`](https://docs.ray.io/en/latest/rllib/index.html).\n\nAs of `v1.0.0` the most crucial limitations of MERLIn stand as:\n\n1. Single environment implemented, namely `Pong`.\n2. Single class of agents implemented, namely variations of `DQN`.\n3. Statistical analysis is rudimentary and does not happen parallel to training.\n\n### Contributions welcome\n\nIf you like MERLIn and want to develop it further, feel free to fork and open any pull request. 🤓\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpykong%2Fmerlin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpykong%2Fmerlin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpykong%2Fmerlin/lists"}