https://github.com/lucidrains/hyper-connections

Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
https://github.com/lucidrains/hyper-connections

artificial-intelligence deep-learning residuals

Last synced: 6 months ago
JSON representation

Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public

Host: GitHub
URL: https://github.com/lucidrains/hyper-connections
Owner: lucidrains
License: mit
Created: 2024-12-24T18:14:23.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2026-01-15T01:04:46.000Z (6 months ago)
Last Synced: 2026-01-15T07:43:13.198Z (6 months ago)
Topics: artificial-intelligence, deep-learning, residuals
Language: Python
Homepage:
Size: 362 KB
Stars: 140
Watchers: 2
Forks: 13
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          

## Hyper Connections

Attempt to make multiple residual streams, proposed in [Hyper-Connections paper](https://arxiv.org/abs/2409.19606) out of Bytedance AI lab, accessible as an easy to use library, as well as for following any new research in this direction.

[Write up on mHC from Subhadip Mitra](https://subhadipmitra.com/blog/2026/deepseek-mhc-manifold-constrained-hyper-connections/)

## Install

```bash

$ pip install hyper-connections

```

## Usage

```python

import torch

from torch import nn

# a single branch layer

branch = nn.Linear(512, 512)

# before

residual = torch.randn(2, 1024, 512)

residual = branch(residual) + residual

# after, say 4 streams in paper

from hyper_connections import get_init_and_expand_reduce_stream_functions

init_hyper_conn, expand_stream, reduce_stream = get_init_and_expand_reduce_stream_functions(4)

# 1. wrap your branch function

hyper_conn_branch = init_hyper_conn(dim = 512, branch = branch)

# 2. expand to 4 streams, this must be done before your trunk, typically a for-loop with many branch functions

residual = expand_stream(residual)

# 3. forward your residual as usual into the wrapped branch function(s)

residual = hyper_conn_branch(residual) 

# 4. reduce 4 streams with a summation, this has to be done after your for-loop trunk. for transformer, unsure whether to do before or after final norm

residual = reduce_stream(residual)

```

Or doing it manually, as in the paper

```python

import torch

from torch import nn

# a single branch layer

branch = nn.Linear(512, 512)

# before

residual = torch.randn(2, 1024, 512)

residual = branch(residual) + residual

# after, say 4 streams in paper

from hyper_connections import get_init_and_expand_reduce_stream_functions

init_hyper_conn, expand_stream, reduce_stream = get_init_and_expand_reduce_stream_functions(4)

# 1. instantiate hyper connection with correct number of streams (4 in this case) - or use the init function above

hyper_conn = init_hyper_conn(dim = 512)

# 2. expand to 4 streams

residual = expand_stream(residual)

# 3. forward your residual into hyper connection for the branch input + add residual function (learned betas)

branch_input, add_residual = hyper_conn(residual)

branch_output = branch(branch_input)

residual = add_residual(branch_output)

# or you can do it in one line as so -> residual = hyper_conn.decorate_branch(branch)(residual)

# 4. reduce 4 streams with a summation, this has to be done after your for loop trunk

residual = reduce_stream(residual)

```

To compare hyper connections to plain residual without changing the code, just pass `disable = True` when fetching the functions

```python

get_init_and_expand_reduce_stream_functions(4, disable = True)

```

To use the fractionated feature dimensions proposed in [a follow up paper](https://arxiv.org/abs/2503.14125) by same authors, just instantiate with `num_fracs` greater than `1` as so

```python

get_init_and_expand_reduce_stream_functions(1, num_fracs = 4) # also allows you to mix streams and fractions of feature dimension

```

## Citation

```bibtex

@article{Zhu2024HyperConnections,

    title   = {Hyper-Connections},

    author  = {Defa Zhu and Hongzhi Huang and Zihao Huang and Yutao Zeng and Yunyao Mao and Banggu Wu and Qiyang Min and Xun Zhou},

    journal = {ArXiv},

    year    = {2024},

    volume  = {abs/2409.19606},

    url     = {https://api.semanticscholar.org/CorpusID:272987528}

}

```

```bibtex

@misc{Rubin2024,

    author  = {Ohad Rubin},

    url     = {https://medium.com/@ohadrubin/exploring-weight-decay-in-layer-normalization-challenges-and-a-reparameterization-solution-ad4d12c24950}

}

```

```bibtex

@article{Zhu2025FracConnectionsFE,

    title   = {Frac-Connections: Fractional Extension of Hyper-Connections},

    author  = {Defa Zhu and Hongzhi Huang and Jundong Zhou and Zihao Huang and Yutao Zeng and Banggu Wu and Qiyang Min and Xun Zhou},

    journal = {ArXiv},

    year    = {2025},

    volume  = {abs/2503.14125},

    url     = {https://api.semanticscholar.org/CorpusID:277104144}

}

```

```bibtex

@misc{xie2025mhcmanifoldconstrainedhyperconnections,

    title   = {mHC: Manifold-Constrained Hyper-Connections}, 

    author  = {Zhenda Xie and Yixuan Wei and Huanqi Cao and Chenggang Zhao and Chengqi Deng and Jiashi Li and Damai Dai and Huazuo Gao and Jiang Chang and Liang Zhao and Shangyan Zhou and Zhean Xu and Zhengyan Zhang and Wangding Zeng and Shengding Hu and Yuqing Wang and Jingyang Yuan and Lean Wang and Wenfeng Liang},

    year    = {2025},

    eprint  = {2512.24880},

    archivePrefix = {arXiv},

    primaryClass = {cs.CL},

    url     = {https://arxiv.org/abs/2512.24880}, 

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucidrains/hyper-connections

Awesome Lists containing this project

README