https://github.com/agora-lab-ai/star
Implementation of the paper from Liquid AI
https://github.com/agora-lab-ai/star
Last synced: 2 months ago
JSON representation
Implementation of the paper from Liquid AI
- Host: GitHub
- URL: https://github.com/agora-lab-ai/star
- Owner: Agora-Lab-AI
- License: mit
- Created: 2024-12-03T17:18:26.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-03T08:18:39.000Z (4 months ago)
- Last Synced: 2025-03-27T15:21:27.118Z (3 months ago)
- Language: Python
- Size: 2.17 MB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# STAR: Structured Token-mixing Adaptive Residual Networks
[](https://discord.gg/agora-999382051935506503) [](https://www.youtube.com/@kyegomez3242) [](https://www.linkedin.com/in/kye-g-38759a207/) [](https://x.com/kyegomezb)
STAR is a novel neural network architecture that implements Linear Input-Varying (LIV) operators with structured token mixing patterns. This architecture provides an efficient approach to sequence modeling by combining different mixing structures with adaptive residual connections.
## Key Features
- Flexible token mixing structures (Diagonal, Low-Rank, Scaled-Toeplitz, Sequential Semi-Separable)
- Configurable channel mixing patterns (Diagonal, Dense, Grouped)
- Feature sharing mechanisms for improved efficiency
- Adaptive residual connections with pre-norm architecture
- Genome-based architecture specification## Installation
```bash
pip install star-backbone
```## Quick Start
```python
from star import STARBackbone, LIVConfig, TokenMixingStructure, ChannelMixingStructure# Configure model
dim = 512
depth = 24# Define genome
genome = [
[1, 1, 1, 1, 1], # SA-1
[9, 1, 1, 1, 1], # GMemless
[1, 2, 1, 2, 1], # SA-1 with sharing
]# Configure operators
configs = {
1: LIVConfig(
featurizer_class=1,
token_mixing=TokenMixingStructure.LOW_RANK,
sparsity_mask=False,
nonlinearity="softmax",
channel_mixing=ChannelMixingStructure.GROUPED
),
9: LIVConfig(
featurizer_class=9,
token_mixing=TokenMixingStructure.DIAGONAL,
sparsity_mask=False,
nonlinearity="silu",
channel_mixing=ChannelMixingStructure.DENSE
)
}# Create model
model = STARBackbone(dim, depth, genome, configs)
```## Architecture Details
### LIV Operators
The core building blocks are Linear Input-Varying (LIV) operators that combine:
- Token mixing structures for sequence interaction
- Channel mixing patterns for feature transformation
- Nonlinear activations
- Optional sparsity masks### Token Mixing Structures
- **DIAGONAL**: Element-wise scaling
- **LOW_RANK**: Attention-like mechanisms with Q/K/V projections
- **SCALED_TOEPLITZ**: Convolution-based local mixing
- **SEQUENTIAL_SEMI_SEPARABLE**: Recurrent processing with gating### Channel Mixing Types
- **DIAGONAL**: Independent channel scaling
- **DENSE**: Full channel interaction
- **GROUPED**: Group-wise channel mixing## Genome Specification
Each layer is specified by a 5-integer sequence:
1. LIV operator class ID
2. Featurizer sharing group
3. Reserved
4. Feature sharing group
5. Reserved## Configuration
The `LIVConfig` dataclass specifies:
- `featurizer_class`: Integer ID for featurizer type
- `token_mixing`: TokenMixingStructure enum value
- `channel_mixing`: ChannelMixingStructure enum value
- `sparsity_mask`: Boolean for optional sparsity
- `nonlinearity`: Optional activation function name
- `expansion_factor`: Channel expansion multiplier
- `repeat_factor`: Feature repeat factor## Contributing
1. Fork the repository
2. Create feature branch (`git checkout -b feature/name`)
3. Commit changes (`git commit -am 'Add feature'`)
4. Push branch (`git push origin feature/name`)
5. Open Pull Request## License
MIT License. See LICENSE file for details.
## Citation
If you use STAR in your research, please cite:
```bibtex
@article{star2024,
title={STAR: Structured Token-mixing Adaptive Residual Networks},
author={[Authors]},
journal={[Journal]},
year={2024}
}
```