https://github.com/vita-group/l2pg
[ICML 2023] "Learning to Optimize Differentiable Games" by Xuxi Chen, Nelson Vadori, Tianlong Chen, Zhangyang Wang
https://github.com/vita-group/l2pg
differentiable-games learning-to-optimize
Last synced: about 1 year ago
JSON representation
[ICML 2023] "Learning to Optimize Differentiable Games" by Xuxi Chen, Nelson Vadori, Tianlong Chen, Zhangyang Wang
- Host: GitHub
- URL: https://github.com/vita-group/l2pg
- Owner: VITA-Group
- Created: 2023-05-22T00:31:56.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-08-05T07:02:26.000Z (almost 3 years ago)
- Last Synced: 2025-03-27T01:35:32.402Z (about 1 year ago)
- Topics: differentiable-games, learning-to-optimize
- Language: Python
- Homepage:
- Size: 188 KB
- Stars: 9
- Watchers: 11
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Learning to Optimize Differentiable Games
Code for ICML 2023 paper: [Learning to Optimize Differentiable Games]().
## Overview
Many machine learning problems can be abstracted in solving game theory formulations and boil down to optimizing nested objectives, such as generative adversarial networks (GANs) and multi-agent reinforcement learning.
Solving these games requires finding their stable fixed points or Nash equilibrium. However, existing algorithms for solving games suffer from empirical instability, hence demanding heavy ad-hoc tuning in practice.
To tackle these challenges, we resort to the emerging scheme of $\textit{Learning to Optimize}$ (L2O), which discovers problem-specific efficient optimization algorithms through data-driven training. Our customized L2O framework for differentiable game theory problems, dubbed $\textit{``Learning to Play Games"}$ (L2PG), seeks a stable fixed point solution, by predicting the fast update direction from the past trajectory, with a novel gradient stability-aware, sign-based loss function. We further incorporate curriculum learning and self-learning to strengthen the empirical training stability and generalization of L2PG. On test problems including quadratic games and GANs, L2PG can substantially accelerate the convergence, and demonstrates a remarkably more stable trajectory.
## Experiments
### Meta-Testing
We provide checkpoints of the optimizers under the `checkpoints` folder.
#### Two-player games (stable)
```bash
python sga_l2o_batch.py --eval_game_list stable_game_list_normal.txt --checkpoint checkpoints/two_player.pkl --n_hidden 32 --formula grad,A,S --feat_level o,m0.5,m0.9,m0.99,mt,gt
```
#### Four-player games
python sga_l2o_batch_four_player.py --checkpoint checkpoints/four_player.pkl --n_hidden 32 --formula grad,S,A --feat_level o,m0.5,m0.9,m0.99,mt,gt --n_player 4
#### GANs
### Meta-Training
#### Two-player games
```bash
python -u sga_l2o_train.py --n_hidden 32 --game-distribution gaussian --formula grad,S,A --inner-iterations 100 --feat_level o,m0.5,m0.9,m0.99,mt,gt --unroll_length 10 --output-name all_cl_0.1_reg_2_0.95.pkl --eval-game-list stable_game_list_new.txt --init-mode ball --use-slow-optimizer --use-slow-ema --slow-optimizer-freq 5 --epoch 300 --wandb-name all_cl_0.1_reg_2_0.95 --slow-ema 0.95 --slow-optimizer-start 0.1 --cl --reg_2 --data-cl --batch-size 128
```