https://github.com/vita-group/architecture_convergence
[Neurips 2022] "Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis" by Wuyang Chen*, Wei Huang*, Xinyu Gong, Boris Hanin, Zhangyang Wang
https://github.com/vita-group/architecture_convergence
convergence-analysis neural-architectures ntk
Last synced: 6 months ago
JSON representation
[Neurips 2022] "Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis" by Wuyang Chen*, Wei Huang*, Xinyu Gong, Boris Hanin, Zhangyang Wang
- Host: GitHub
- URL: https://github.com/vita-group/architecture_convergence
- Owner: VITA-Group
- License: mit
- Created: 2022-05-11T16:49:43.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-11T18:31:18.000Z (almost 3 years ago)
- Last Synced: 2025-03-29T09:42:09.626Z (6 months ago)
- Topics: convergence-analysis, neural-architectures, ntk
- Language: Python
- Homepage:
- Size: 543 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [[PDF](https://arxiv.org/pdf/2205.05662.pdf)]
[](LICENSE.md)
Wuyang Chen*, Wei Huang*, Xinyu Gong, Boris Hanin, Zhangyang Wang
In Neurips 2022.
[code under development]
## Overview
We link the convergence rate of a network with its architecture topology (connectivity patterns), and further guide the efficient neural architecture design.
Highlights:
* We first theoretically analyze the convergence of gradient descent of diverse neural network architectures, and find the connectivity patterns largely impact their bound of convergence rate.
* From the theoretical analysis, we abstract two practical principles on designing the network’s connectivity pattern: "effective depth" $\bar{d}$ and "effective width" $\bar{m}$.
* Both our convergence analysis and principles on effective depth/width are verified by experiments on diverse architectures and datasets. Our method can further significantly accelerate the neural architecture search without introducing any extra cost.
![]()
## Prerequisites
- Ubuntu 16.04
- Python 3.6.9
- CUDA 10.1 (lower versions may work but were not tested)
- NVIDIA GPU + CuDNN v7.3This repository has been tested on GTX 1080Ti. Configurations may need to be changed on different platforms.
## Usage
### 1. `mlp_code`
This code base is for training an MLP network defined by an arbitrary DAG (directed acyclic graph) (e.g. three examples in our Figure 3).### 2. TENAS + DAG
Modified from [TENAS](https://github.com/VITA-Group/TENAS).
On top of TENAS, we further reduce the search cost by avoid evaluating supernet of moderate depth and width.### 3. WOT + DAG
Modified from [WOT](https://github.com/BayesWatch/nas-without-training).
On top of WOT (Neural Architecture Search Without Training), we further reduce the search cost by avoid evaluating bad architectures of extreme depth or extreme width.## Citation
```
@article{chen2022deep,
title={Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis},
author={Chen, Wuyang and Huang, Wei and Gong, Xinyu and Hanin, Boris and Wang, Zhangyang},
journal={Advances in neural information processing systems},
year={2022}
}
```