An open API service indexing awesome lists of open source software.

https://github.com/sunsvea/cl-sparse-transformer

A CL (coulstock) sparse transformer implementation
https://github.com/sunsvea/cl-sparse-transformer

open-source pytorch sparse-transformer

Last synced: about 2 months ago
JSON representation

A CL (coulstock) sparse transformer implementation

Awesome Lists containing this project

README

        

# Sparse Transformer Implementation

This repository contains a PyTorch implementation of a Transformer model with sparse attention patterns. The goal is to explore and implement various sparse attention mechanisms to improve the efficiency of transformer models while maintaining performance.

## Current Features

- Local sparse attention mechanism (window-based)
- Configurable model architecture (layers, heads, dimensions)
- Basic positional encoding
- Simple training loop for sequence prediction
- CPU support

## Installation

```bash
# Clone the repository
git clone https://github.com/yourusername/sparse-transformer.git
cd sparse-transformer

# Install dependencies
pip install torch
```

## Quick Start

```python
# Run the example training script
python sparse_transformer.py
```

## Output

When running the training script, you should see output similar to this:

```
2025-01-31 10:15:23,456 - INFO - Starting training...
2025-01-31 10:15:23,789 - INFO - Generated 1000 sample sequences...
2025-01-31 10:15:23,901 - INFO - Split data into 800 train and 200 validation sequences

2025-01-31 10:15:24,123 - INFO - Epoch 1/5
2025-01-31 10:15:24,456 - INFO - Batch 0, Loss: 4.6573
2025-01-31 10:15:24,789 - INFO - Batch 10, Loss: 4.3291
2025-01-31 10:15:25,012 - INFO - Training Loss: 4.2845
2025-01-31 10:15:25,234 - INFO - Validation Loss: 4.1932
2025-01-31 10:15:25,345 - INFO - Saved new best model checkpoint
2025-01-31 10:15:25,456 - INFO - Epoch completed in 1.33s

[...]

2025-01-31 10:15:35,678 - INFO - Training completed in 12.22s
2025-01-31 10:15:35,789 - INFO - Best validation loss: 3.2456
```

The model saves checkpoints to `./checkpoints/` whenever the validation loss improves.

## Architecture

The current implementation includes:

- `LocalSparseAttention`: Implements window-based sparse attention where each token attends only to its neighbors
- `SparseTransformerBlock`: A single transformer block with sparse attention
- `SparseTransformer`: The full model with embedding layer and multiple transformer blocks

## TODO List

### Phase 1: Core Functionality
- [x] Add proper data loading and preprocessing
- [x] Implement validation loop
- [ ] Add model checkpointing
- [ ] Add logging and metrics tracking
- [ ] Write unit tests for core components

### Phase 2: Sparse Attention Patterns
- [ ] Implement strided sparse attention
- [ ] Add block sparse attention
- [ ] Implement learned sparsity patterns
- [ ] Create dynamic/adaptive sparsity mechanisms

### Phase 3: Optimizations
- [ ] Optimize sparse matrix operations
- [ ] Add mixed precision training
- [ ] Implement gradient checkpointing
- [ ] Add multi-GPU support
- [ ] Optimize memory usage

### Phase 4: Analysis & Visualization
- [ ] Add attention pattern visualization
- [ ] Create performance benchmarking suite
- [ ] Add sparsity pattern analysis tools
- [ ] Implement attention head importance analysis
- [ ] Create training dynamics visualization

### Phase 5: Documentation & Examples
- [ ] Add detailed API documentation
- [ ] Create Jupyter notebook tutorials
- [ ] Add example configurations
- [ ] Write contribution guidelines
- [ ] Create performance comparison benchmarks

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

## License

MIT

## Citation

If you use this code in your research, please cite:

```bibtex
@software{sparse_transformer2025,
author = {Dean Coulstock},
title = {Sparse Transformer Implementation},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Sunsvea/sparse-transformer}
}
```

## Contact

- Dean Coulstock
- [email protected]
- LinkedIn: https://www.linkedin.com/in/dean-coulstock/

## Acknowledgments

This implementation draws inspiration from:
- "Generating Long Sequences with Sparse Transformers" (Child et al., 2019)
- "Longformer: The Long-Document Transformer" (Beltagy et al., 2020)