{"id":24625298,"url":"https://github.com/seungjaeryanlee/implementations-nfq","last_synced_at":"2025-10-06T16:32:19.077Z","repository":{"id":50400775,"uuid":"151182501","full_name":"seungjaeryanlee/implementations-nfq","owner":"seungjaeryanlee","description":"Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method","archived":false,"fork":false,"pushed_at":"2019-08-01T07:03:30.000Z","size":438,"stargazers_count":30,"open_issues_count":2,"forks_count":9,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-05-02T03:46:51.594Z","etag":null,"topics":["deep-learning","deep-reinforcement-learning","machine-learning","paper","python","reinforcement-learning"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seungjaeryanlee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-02T01:05:24.000Z","updated_at":"2023-12-05T05:01:29.000Z","dependencies_parsed_at":"2022-09-07T11:51:39.567Z","dependency_job_id":null,"html_url":"https://github.com/seungjaeryanlee/implementations-nfq","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seungjaeryanlee%2Fimplementations-nfq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seungjaeryanlee%2Fimplementations-nfq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seungjaeryanlee%2Fimplementations-nfq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seungjaeryanlee%2Fimplementations-nfq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seungjaeryanlee","download_url":"https://codeload.github.com/seungjaeryanlee/implementations-nfq/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235534299,"owners_count":19005470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-reinforcement-learning","machine-learning","paper","python","reinforcement-learning"],"created_at":"2025-01-25T04:16:52.343Z","updated_at":"2025-10-06T16:32:13.770Z","avatar_url":"https://github.com/seungjaeryanlee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method\n\n[![black Build Status](https://img.shields.io/travis/com/seungjaeryanlee/implementations-nfq.svg?label=black)](https://black.readthedocs.io/en/stable/)\n[![flake8 Build Status](https://img.shields.io/travis/com/seungjaeryanlee/implementations-nfq.svg?label=flake8)](http://flake8.pycqa.org/en/latest/)\n[![isort Build Status](https://img.shields.io/travis/com/seungjaeryanlee/implementations-nfq.svg?label=isort)](https://pypi.org/project/isort/)\n[![pytest Build Status](https://img.shields.io/travis/com/seungjaeryanlee/implementations-nfq.svg?label=pytest)](https://docs.pytest.org/en/latest/)\n\n[![numpydoc Docstring Style](https://img.shields.io/badge/docstring-numpydoc-blue.svg)](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue.svg)](https://pre-commit.com/)\n\nThis repository is an implementation of the paper [Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method (Riedmiller, 2005)](/paper.pdf).\n\n**Please ⭐ this repository if you found it useful!**\n\n\n---\n\n### Table of Contents 📜\n\n- [Summary](#summary-)\n- [Installation](#installation-)\n- [Running](#running-)\n- [Results](#results-)\n- [Differences from the Paper](#differences-from-the-paper-)\n- [Reproducibility](#reproducibility-)\n \nFor implementations of other deep learning papers, check the **[implementations](https://github.com/seungjaeryanlee/implementations) repository**!\n\n---\n \n### Summary 📝\n\nNeural Fitted Q-Iteration used a deep neural network for a Q-network, with its input being observation (s) and action (a) and its output being its action value (Q(s, a)). Instead of online Q-learning, the paper proposes **batch offline updates** by collecting experience throughout the episode and updating with that batch. The paper also suggests **hint-to-goal** method, where the neural network is trained explicitly in goal regions so that it can correctly estimate the value of the goal region.\n\n### Installation 🧱\n\nFirst, clone this repository from GitHub. Since this repository contains submodules, you should use the `--recursive` flag.\n\n```bash\ngit clone --recursive https://github.com/seungjaeryanlee/implementations-nfq.git\n```\n\nIf you already cloned the repository without the flag, you can download the submodules separately with the `git submodules` command:\n\n```bash\ngit clone https://github.com/seungjaeryanlee/implementations-nfq.git\ngit submodule update --init --recursive\n```\n\nAfter cloing the repository, use the [requirements.txt](/requirements.txt) for simple installation of PyPI packages.\n\n```bash\npip install -r requirements.txt\n```\n\nYou can read more about each package in the comments of the [requirements.txt](/requirements.txt) file!\n\n### Running 🏃\n\nYou can train the NFQ agent on Cartpole Regulator using the given configuration file with the below command:\n```\npython train_eval.py -c cartpole.conf\n```\n\nFor a reproducible run, use the `--RANDOM_SEED` flag.\n```\npython train_eval.py -c cartpole.conf --RANDOM_SEED=1\n```\n\nTo save a trained agent, use the `--SAVE_PATH` flag.\n```\npython train_eval.py -c cartpole.conf --SAVE_PATH=saves/cartpole.pth\n```\n\nTo load a trained agent, use the `--LOAD_PATH` flag.\n```\npython train_eval.py -c cartpole.conf --LOAD_PATH=saves/cartpole.pth\n```\n\nTo enable logging to TensorBoard or W\u0026B, use appropriate flags.\n```\npython train_eval.py -c cartpole.conf --USE_TENSORBOARD --USE_WANDB\n```\n\n### Results 📊\n\nThis repository uses **TensorBoard** for offline logging and **Weights \u0026 Biases** for online logging. You can see the all the metrics in [my summary report at Weights \u0026 Biases](https://app.wandb.ai/seungjaeryanlee/implementations-nfq/reports?view=seungjaeryanlee%2FSummary)!\n\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"Train Episode Length\" src=\"https://user-images.githubusercontent.com/6107926/62005353-07af6e80-b16d-11e9-8fc9-798af69de2e4.png\" width=\"49%\"\u003e\n  \u003cimg alt=\"Evaluation Episode Length\" src=\"https://user-images.githubusercontent.com/6107926/62005354-08480500-b16d-11e9-9c03-facb5f3c6b87.png\" width=\"49%\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"Train Episode Cost\" src=\"https://user-images.githubusercontent.com/6107926/62005355-08480500-b16d-11e9-9b82-6516677deec6.png\" width=\"49%\"\u003e\n  \u003cimg alt=\"Evaluation Episode Cost\" src=\"https://user-images.githubusercontent.com/6107926/62005356-08480500-b16d-11e9-95ed-09259728e1c3.png\" width=\"49%\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"Total Cycle\" src=\"https://user-images.githubusercontent.com/6107926/62005359-08e09b80-b16d-11e9-949a-88313763992d.png\" width=\"32%\"\u003e\n  \u003cimg alt=\"Total Cost\" src=\"https://user-images.githubusercontent.com/6107926/62005360-08e09b80-b16d-11e9-9c89-a4f0f4e075a6.png\" width=\"32%\"\u003e\n  \u003cimg alt=\"Train Loss\" src=\"https://user-images.githubusercontent.com/6107926/62005357-08480500-b16d-11e9-91ca-52368d49dce5.png\" width=\"32%\"\u003e\n\u003c/p\u003e\n\n### Differences from the Paper 👥\n\n- From the 3 environments (Pole Balancing, Mountain Car, Cartpole Regulator), only the Cartpole Regulator environment was implemented and tested. It is the most difficult environment.\n- For the Cartpole Regulator, the success state is relaxed so that the state is successful whenever the pole angle is at most 24 degrees away from upright position. In the original paper, the cart must also be in the center with 0.05 tolerance.\n- Evaluation of the trained policy is only done in 1 evaluation environment, instead of 1000.\n\n### Reproducibility 🎯\n\nDespite having no open-source code, the paper had sufficient details to implement NFQ. However, the results were not fully reproducible: we had to relax the definition of goal states and simplify evaluation. Still, the agent was able to learn to balance a CartPole for 3000 steps while only training from 100-step environment.\n\nFew nits:\n\n- There is no specification of pole angle for goal and forbidden states. We set 0~24 degrees from upright position as a requirement for goal state and any state with 90+ degrees forbidden.\n- The paper randomly initializes network weights within [−0.5, 0.5], but does not mention bias initialization.\n- The goal velocity of the success states is not mentioned. We use a normal distribution to randomly generate velocities for the hint-to-goal variant.\n- It is unclear whether to add experience after or before training the agent for each epoch. We assume adding experience before training.\n- The learning rate for the Rprop optimizer is not specified.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseungjaeryanlee%2Fimplementations-nfq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseungjaeryanlee%2Fimplementations-nfq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseungjaeryanlee%2Fimplementations-nfq/lists"}