https://github.com/sunsvea/cl-sparse-transformer
A CL (coulstock) sparse transformer implementation
https://github.com/sunsvea/cl-sparse-transformer
open-source pytorch sparse-transformer
Last synced: about 2 months ago
JSON representation
A CL (coulstock) sparse transformer implementation
- Host: GitHub
- URL: https://github.com/sunsvea/cl-sparse-transformer
- Owner: Sunsvea
- License: mit
- Created: 2025-01-31T12:53:55.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-31T13:33:16.000Z (over 1 year ago)
- Last Synced: 2025-04-03T12:16:43.677Z (about 1 year ago)
- Topics: open-source, pytorch, sparse-transformer
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sparse Transformer Implementation
This repository contains a PyTorch implementation of a Transformer model with sparse attention patterns. The goal is to explore and implement various sparse attention mechanisms to improve the efficiency of transformer models while maintaining performance.
## Current Features
- Local sparse attention mechanism (window-based)
- Configurable model architecture (layers, heads, dimensions)
- Basic positional encoding
- Simple training loop for sequence prediction
- CPU support
## Installation
```bash
# Clone the repository
git clone https://github.com/Sunsvea/cl-sparse-transformer.git
cd cl-sparse-transformer
# Install dependencies
pip install torch
```
## Quick Start
```python
# Run the example training script
python sparse_transformer.py
```
## Output
When running the training script, you should see output similar to this:
```
2025-01-31 10:15:23,456 - INFO - Starting training...
2025-01-31 10:15:23,789 - INFO - Generated 1000 sample sequences...
2025-01-31 10:15:23,901 - INFO - Split data into 800 train and 200 validation sequences
2025-01-31 10:15:24,123 - INFO - Epoch 1/5
2025-01-31 10:15:24,456 - INFO - Batch 0, Loss: 4.6573
2025-01-31 10:15:24,789 - INFO - Batch 10, Loss: 4.3291
2025-01-31 10:15:25,012 - INFO - Training Loss: 4.2845
2025-01-31 10:15:25,234 - INFO - Validation Loss: 4.1932
2025-01-31 10:15:25,345 - INFO - Saved new best model checkpoint
2025-01-31 10:15:25,456 - INFO - Epoch completed in 1.33s
[...]
2025-01-31 10:15:35,678 - INFO - Training completed in 12.22s
2025-01-31 10:15:35,789 - INFO - Best validation loss: 3.2456
```
The model saves checkpoints to `./checkpoints/` whenever the validation loss improves.
## Architecture
The current implementation includes:
- `LocalSparseAttention`: Implements window-based sparse attention where each token attends only to its neighbors
- `SparseTransformerBlock`: A single transformer block with sparse attention
- `SparseTransformer`: The full model with embedding layer and multiple transformer blocks
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
## License
MIT
## Citation
If you use this code in your research, please cite:
```bibtex
@software{sparse_transformer2025,
author = {Dean Coulstock},
title = {Sparse Transformer Implementation},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Sunsvea/sparse-transformer}
}
```
## Contact
- Dean Coulstock
- deanjcoulstock@gmail.com
- LinkedIn: https://www.linkedin.com/in/dean-coulstock/
## Acknowledgments
This implementation draws inspiration from:
- "Generating Long Sequences with Sparse Transformers" (Child et al., 2019)
- "Longformer: The Long-Document Transformer" (Beltagy et al., 2020)