https://github.com/the-swarm-corporation/dart

DART (Diffusion-Autoregressive Recursive Transformer) is a novel hybrid architecture that combines diffusion-based and autoregressive approaches for text generation.
https://github.com/the-swarm-corporation/dart

agents anthropic attention autogressive diffusion dit gpts llms midjourney openai research text-generation torch transformers vision-models

Last synced: about 2 months ago
JSON representation

DART (Diffusion-Autoregressive Recursive Transformer) is a novel hybrid architecture that combines diffusion-based and autoregressive approaches for text generation.

Host: GitHub
URL: https://github.com/the-swarm-corporation/dart
Owner: The-Swarm-Corporation
License: mit
Created: 2025-04-28T03:12:43.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2025-04-28T03:28:40.000Z (about 2 months ago)
Last Synced: 2025-04-30T01:09:49.655Z (about 2 months ago)
Topics: agents, anthropic, attention, autogressive, diffusion, dit, gpts, llms, midjourney, openai, research, text-generation, torch, transformers, vision-models
Language: Python
Homepage: https://discord.gg/jM3Z6M9uMq
Size: 46.9 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

        # DART: Diffusion-Autoregressive Recursive Transformer

![https://swarms.ai](dart-architecture.svg)

## Overview

DART (Diffusion-Autoregressive Recursive Transformer) is a novel hybrid architecture that combines diffusion-based and autoregressive approaches for text generation. By leveraging both paradigms, DART achieves robust global coherence through diffusion while maintaining local consistency via autoregressive modeling.

## Key Features

- **Hybrid Architecture**: Integrates diffusion and autoregressive components in a unified framework

- **Adaptive Noise Scheduling**: Implements multiple noise scheduling strategies (linear, cosine, quadratic)

- **Flexible Generation**: Supports both conditional and unconditional text generation

- **Production Ready**: Full type annotations, comprehensive logging, and configurable parameters

- **Efficient Implementation**: Optimized attention mechanisms and memory usage

- **Modular Design**: Easy to extend and modify for specific use cases

## Model Architecture

DART consists of several key components:

- Diffusion Transformer (DiT) blocks for global dependency modeling

- Autoregressive blocks for local coherence

- Adaptive noise scheduling mechanism

- Dual-path information exchange during training

- Classifier-free guidance support

## Installation

```bash

https://github.com/The-Swarm-Corporation/DART.git

cd DART

pip install -r requirements.txt

```

## Quick Start

```python

from main import DART, DARTConfig

# Initialize configuration

config = DARTConfig(

    vocab_size=50257,  # GPT-2 vocabulary size

    hidden_size=768,

    num_hidden_layers=12,

    num_attention_heads=12,

    diffusion_steps=1000,

    ar_weight=0.5,

)

# Initialize model

model = DART(config)

# Training example

input_ids = torch.randint(0, config.vocab_size, (4, 128))

loss_dict = model.compute_loss(input_ids)

loss = loss_dict["loss"]

loss.backward()

# Generation example

generated = model.generate(

    input_ids=torch.tensor([[0, 1, 2, 3]]),

    max_length=128,

    temperature=0.8,

    do_sample=True,

)

```

## Configuration Options

| Parameter | Default | Description |

|-----------|---------|-------------|

| vocab_size | 50257 | Vocabulary size (default: GPT-2) |

| hidden_size | 768 | Dimension of hidden layers |

| num_hidden_layers | 12 | Number of transformer layers |

| num_attention_heads | 12 | Number of attention heads |

| diffusion_steps | 1000 | Number of diffusion steps |

| ar_weight | 0.5 | Weight between AR and diffusion |

## Advanced Usage

### Custom Noise Scheduling

```python

model = DART(DARTConfig(

    diffusion_schedule="cosine",  # Options: linear, cosine, quadratic

    diffusion_steps=1000,

))

```

### Classifier-Free Guidance

```python

generated = model.generate(

    input_ids=input_ids,

    guidance_scale=7.5,  # Higher values = stronger guidance

)

```

## Citation

If you use DART in your research, please cite:

```bibtex

@article{dart2024,

  title={DART: Diffusion-Autoregressive Recursive Transformer for Text Generation},

  author={Kye Gomez },

  journal={[Journal/Conference]},

  year={2024}

}

```

## Contributing

We welcome contributions! Please see our [contributing guidelines](CONTRIBUTING.md) for details.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Thanks to the authors of DiT and related works in diffusion models

- Built with PyTorch and Transformers library

- Special thanks to the research community for valuable feedback

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/the-swarm-corporation/dart

Awesome Lists containing this project

README