{"id":31211488,"url":"https://github.com/cyberagentailab/m2wu","last_synced_at":"2025-09-21T05:27:27.227Z","repository":{"id":109618289,"uuid":"604578757","full_name":"CyberAgentAILab/m2wu","owner":"CyberAgentAILab","description":null,"archived":false,"fork":false,"pushed_at":"2023-03-22T07:19:26.000Z","size":5408,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-10T07:42:53.921Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CyberAgentAILab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-02-21T11:05:16.000Z","updated_at":"2024-05-17T06:49:22.000Z","dependencies_parsed_at":"2023-03-08T05:15:12.729Z","dependency_job_id":null,"html_url":"https://github.com/CyberAgentAILab/m2wu","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CyberAgentAILab/m2wu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fm2wu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fm2wu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fm2wu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fm2wu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CyberAgentAILab","download_url":"https://codeload.github.com/CyberAgentAILab/m2wu/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fm2wu/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274496576,"owners_count":25296405,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-10T02:00:12.551Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-21T05:27:21.652Z","updated_at":"2025-09-21T05:27:26.099Z","avatar_url":"https://github.com/CyberAgentAILab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games\r\nCode for reproducing results in the paper \"[Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games](https://arxiv.org/abs/2208.09855)\".\r\n\r\n## About\r\nThis paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings.\r\nIn the former, players observe their exact gradient vectors of the utility functions.\r\nIn the latter, they only observe the noisy gradient vectors.\r\nEven the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback.\r\nOn the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings.\r\nWe then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term.\r\nWe empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.\r\n\r\n## Installation\r\nThis code is written in Python 3.\r\nTo install the required dependencies, execute the following command:\r\n```bash\r\n$ pip install -r requirements.txt\r\n```\r\n\r\n### For Docker User\r\nBuild the container:\r\n```bash\r\n$ docker build -t m2wu .\r\n```\r\nAfter build finished, run the container:\r\n```bash\r\n$ docker run -it m2wu\r\n```\r\n\r\n## Run Experiments\r\nIn order to investigate the performance of M2WU in biased Rock-Paper-Scissors with full feedback, execute the following command:\r\n```bash\r\n$ python run_experiment.py --num_trials 10 --T 100000 --random_init_strategy --algorithm m2wu\r\n$ python run_experiment.py --num_trials 10 --T 100000 --random_init_strategy --algorithm m2wu --update_freq 100\r\n```\r\nIn this experiment, the following options can be specified:\r\n* `--game`: Name of a matrix game. The default value is `biased_rps`.\r\n* `--algorithm`: Learning algorithm.\r\n* `--num_trials`: Number of trials to run experiments. The default value is `1`.\r\n* `--T`: Number of iterations. The default value is `10000`.\r\n* `--feedback`: Feedback type given to players. The default value is `full`.\r\n* `--seed`: Random seed. The default value is `0`.\r\n* `--random_init_strategy`: Whether to generate the initial strategy uniformly at random. The default value is `False`.\r\n* `--eta`: Learning rate. The default value is `0.1`.\r\n* `--decay`: Whether to use decreasing learning rates. The default value is `False`.\r\n* `--mu`: Mutation rate for M2WU. The default value is `0.1`.\r\n* `--update_freq`: Update the reference strategies in M2WU every N iterations. The value of `None` means that the reference strategies are not updated. The default value is `None`.\r\n\r\n\r\nTo evaluate M2WU via an experiment in biased Rock-Paper-Scissors with noisy feedback, execute the following command:\r\n```bash\r\n$ python run_experiment.py --num_trials 10 --T 1000000 --feedback noisy --algorithm m2wu --eta 0.001 --mu 0.1\r\n$ python run_experiment.py --num_trials 10 --T 1000000 --feedback noisy --algorithm m2wu --eta 0.001 --mu 0.5 --update_freq 20000\r\n```\r\n\r\n## Citation\r\nIf you use our code in your work, please cite our paper:\r\n```\r\n@article{abe2022m2wu,\r\n  title={Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games},\r\n  author={Abe, Kenshi and Ariu, Kaito and Sakamoto, Mitsuki and Toyoshima, Kentaro and Iwasaki, Atsushi},\r\n  journal={arXiv preprint arXiv:2208.09855},\r\n  year={2022}\r\n}\r\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberagentailab%2Fm2wu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyberagentailab%2Fm2wu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberagentailab%2Fm2wu/lists"}