{"id":13710506,"url":"https://github.com/hardmaru/estool","last_synced_at":"2025-04-13T16:36:26.815Z","repository":{"id":26396426,"uuid":"108713583","full_name":"hardmaru/estool","owner":"hardmaru","description":"Evolution Strategies Tool","archived":false,"fork":false,"pushed_at":"2022-12-08T10:43:25.000Z","size":7391,"stargazers_count":936,"open_issues_count":8,"forks_count":163,"subscribers_count":32,"default_branch":"master","last_synced_at":"2024-11-13T21:44:26.311Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hardmaru.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-29T07:20:10.000Z","updated_at":"2024-11-12T09:42:49.000Z","dependencies_parsed_at":"2023-01-14T07:15:15.194Z","dependency_job_id":null,"html_url":"https://github.com/hardmaru/estool","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hardmaru%2Festool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hardmaru%2Festool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hardmaru%2Festool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hardmaru%2Festool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hardmaru","download_url":"https://codeload.github.com/hardmaru/estool/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248745101,"owners_count":21155006,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T23:00:57.520Z","updated_at":"2025-04-13T16:36:26.781Z","avatar_url":"https://github.com/hardmaru.png","language":"Jupyter Notebook","funding_links":[],"categories":["Evolutionary Algorithms (EA)","Machine Learning Framework","Jupyter Notebook"],"sub_categories":["RL/DRL Books","General Purpose Framework"],"readme":"# ESTool\n\n\u003ccenter\u003e\n\u003cimg src=\"https://cdn.jsdelivr.net/gh/hardmaru/pybullet_animations@f6f7fcd72ded6b1772b1b21462dff69e93f94520/anim/biped/biped_cma.gif\" width=\"100%\"/\u003e\n\u003ci\u003eEvolved Biped Walker.\u003c/i\u003e\u003cbr/\u003e\n\u003c/center\u003e\n\u003cp\u003e\u003c/p\u003e\n\nImplementation of various Evolution Strategies, such as GA, Population-based REINFORCE (Section 6 of [Williams 1992](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf)), CMA-ES and OpenAI's ES using common interface.\n\nCMA-ES is wrapping around [pycma](https://github.com/CMA-ES/pycma).\n\n# Notes\n\nThe tool last tested using the following configuration:\n\n- NumPy 1.13.3 (1.14 has some annoying warning).\n\n- OpenAI Gym 0.9.4 (breaks for 0.10.0+ since they changed the API).\n\n- cma 2.2.0, basically 2+ should work.\n\n- PyBullet 1.6.3 (possible that newer versions might work, but have not tested).\n\n- Python 3, although 2 might work.\n\n- mpi4py 2\n\n## Backround Reading:\n\n[A Visual Guide to Evolution Strategies](http://blog.otoro.net/2017/10/29/visual-evolution-strategies/)\n\n[Evolving Stable Strategies](http://blog.otoro.net/2017/11/12/evolving-stable-strategies/)\n\n## Using Evolution Strategies Library\n\nTo use es.py, please check out the `simple_es_example.ipynb` notebook.\n\nThe basic concept is:\n\n```\nsolver = EvolutionStrategy()\nwhile True:\n\n  # ask the ES to give us a set of candidate solutions\n  solutions = solver.ask()\n\n  # create an array to hold the solutions.\n  # solver.popsize = population size\n  rewards = np.zeros(solver.popsize)\n\n  # calculate the reward for each given solution\n  # using your own evaluate() method\n  for i in range(solver.popsize):\n    rewards[i] = evaluate(solutions[i])\n\n  # give rewards back to ES\n  solver.tell(rewards)\n\n  # get best parameter, reward from ES\n  reward_vector = solver.result()\n\n  if reward_vector[1] \u003e MY_REQUIRED_REWARD:\n    break\n```\n\n## Parallel Processing Training with MPI\n\nPlease read [Evolving Stable Strategies](http://blog.otoro.net/2017/11/12/evolving-stable-strategies/) article for more demos and use cases.\n\nTo use the training tool (relies on MPI):\n\n```\npython train.py bullet_racecar -n 8 -t 4\n```\n\nwill launch training jobs with 32 workers (using 8 MPI processes). the best model will be saved as a .json file in log/. This model should train in a few minutes on a 2014 MacBook Pro.\n\nIf you have more compute and have access to a 64-core CPU machine, I recommend:\n\n```\npython train.py name_of_environment -e 16 -n 64 -t 4\n```\n\nThis will calculate fitness values based on an average of 16 random runs, on 256 workers (64 MPI processes x 4). In my experience this works reasonably well for most tasks inside `config.py`.\n\nAfter training, to run pre-trained models:\n\n```\npython model.py bullet_ant log/name_of_your_json_file.json\n```\n\n### Self-Contained Cartpole Swingup Task\n\n\u003ccenter\u003e\n\u003cimg src=\"https://rawcdn.githack.com/hardmaru/estool/6cf3b91a0bd840286002884b6a3fa56887ca7e2c/img/cartpole_swingup.gif\" width=\"100%\"/\u003e\u003cbr/\u003e\n\u003c/center\u003e\n\nIf you don't want to install a physics engine, try it on the `cartpole_swingup` task that doesn't have any dependencies:\n\nTraining command:\n\n```\npython train.py cartpole_swingup -n 8 -e 1 -t 4 --sigma_init 1.0\n```\n\nAfter 400 generations, the final average score (over 32 trials) should be over 900. You can run it with this command:\n\n```\npython model.py cartpole_swingup log/cartpole_swingup.cma.1.32.best.json\n```\n\nIf you haven't bothered to run the previous training command, you can load the pre-trained version:\n\n```\npython model.py cartpole_swingup zoo/cartpole_swingup.cma.json\n```\n\n### Self-Contained Slime Volleyball Gym Environment\n\n\u003ccenter\u003e\n\u003cimg src=\"https://otoro.net/img/slimegym/state.gif\" width=\"100%\"/\u003e\u003cbr/\u003e\n\u003c/center\u003e\n\nHere is an example for training [slime volleyball gym](https://github.com/hardmaru/slimevolleygym) environment:\n\nTraining command:\n\n```\npython train.py slimevolley -n 8 -e 8 -t 4 --sigma_init 0.5\n```\n\nPre-trained model:\n\n```\npython model.py slimevolley zoo/slimevolley.cma.64.96.best.json\n```\n\n### PyBullet Envs\n\n\u003ccenter\u003e\n\u003c!--\u003cimg src=\"{{ site.baseurl }}/assets/20171109/biped/bipedcover.gif\" width=\"100%\"/\u003e\u003cbr/\u003e--\u003e\n\u003c!--\u003cimg src=\"{{ site.baseurl }}/assets/20171109/kuka/kuka.gif\" width=\"100%\"/\u003e\u003cbr/\u003e--\u003e\n\u003cimg src=\"https://cdn.jsdelivr.net/gh/hardmaru/pybullet_animations@f6f7fcd72ded6b1772b1b21462dff69e93f94520/anim/robo/bullet_ant_demo.gif\" width=\"50%\"/\u003e\u003cbr/\u003e\n\u003ci\u003ebullet_ant pybullet environment. Population-based REINFORCE.\u003c/i\u003e\u003cbr/\u003e\n\u003c/center\u003e\n\u003cp\u003e\u003c/p\u003e\n\nAnother example: to run a minitaur duck model, run this locally:\n\n```\npython model.py bullet_minitaur_duck zoo/bullet_minitaur_duck.cma.256.json\n```\n\n\u003ccenter\u003e\n\u003c!--\u003cimg src=\"{{ site.baseurl }}/assets/20171109/biped/bipedcover.gif\" width=\"100%\"/\u003e\u003cbr/\u003e--\u003e\n\u003c!--\u003cimg src=\"{{ site.baseurl }}/assets/20171109/kuka/kuka.gif\" width=\"100%\"/\u003e\u003cbr/\u003e--\u003e\n\u003cimg src=\"https://cdn.jsdelivr.net/gh/hardmaru/pybullet_animations@8a6ccaf53456f6fa9e85e258e10f9fa917261571/anim/minitaur/duck_normal_small.gif\" width=\"100%\"/\u003e\u003cbr/\u003e\n\u003ci\u003eCustom Minitaur Env.\u003c/i\u003e\u003cbr/\u003e\n\u003c/center\u003e\n\u003cp\u003e\u003c/p\u003e\n\n\nIn the .hist.json file, and on the screen output, we track the progress of training. The ordering of fields are:\n\n- generation count\n- time (seconds) taken so far\n- average fitness\n- worst fitness\n- best fitness\n- average standard deviation of params\n- average timesteps taken\n- max timesteps taken\n\nUsing `plot_training_progress.ipynb` in an IPython notebook, you can plot the traning logs for the `.hist.json` files. For example, in the `bullet_ant` task:\n\n\u003ccenter\u003e\n\u003cimg src=\"https://cdn.jsdelivr.net/gh/hardmaru/pybullet_animations@5a3847d0bd8407781dc931fdff2fc80f0315ab20/svg/bullet_ant.svg\" width=\"100%\"/\u003e\u003cbr/\u003e\n\u003ci\u003eBullet Ant training progress.\u003c/i\u003e\u003cbr/\u003e\n\u003c/center\u003e\n\u003cp\u003e\u003c/p\u003e\n\nYou need to install mpi4py, pybullet, gym etc to use various environments. Also roboschool/Box2D for some of the OpenAI gym envs.\n\nOn Windows, it is easiest to install mpi4py as follows:\n\n- Download and install mpi_x64.Msi from the HPC Pack 2012 MS-MPI Redistributable Package\n- Install a recent Visual Studio version with C++ compiler\n- Open a command prompt\n```\ngit clone https://github.com/mpi4py/mpi4py\ncd mpi4py\npython setup.py install\n```\nModify the train.py script and replace mpirun with mpiexec and -np with -n\n\n### Citation\n\nIf you find this work useful, please cite it as:\n\n```\n@article{ha2017evolving,\n  title   = \"Evolving Stable Strategies\",\n  author  = \"Ha, David\",\n  journal = \"blog.otoro.net\",\n  year    = \"2017\",\n  url     = \"http://blog.otoro.net/2017/11/12/evolving-stable-strategies/\"\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhardmaru%2Festool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhardmaru%2Festool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhardmaru%2Festool/lists"}