https://github.com/dhruvdcoder/xlm-core
XLM is a modular, research-friendly framework for developing and comparing non-autoregressive language models. Built on PyTorch and PyTorch Lightning, with Hydra for configuration management, XLM makes it effortless to experiment with cutting-edge NAR architectures.
https://github.com/dhruvdcoder/xlm-core
diffusion-models non-autoregressive-generation small-language-models
Last synced: 4 months ago
JSON representation
XLM is a modular, research-friendly framework for developing and comparing non-autoregressive language models. Built on PyTorch and PyTorch Lightning, with Hydra for configuration management, XLM makes it effortless to experiment with cutting-edge NAR architectures.
- Host: GitHub
- URL: https://github.com/dhruvdcoder/xlm-core
- Owner: dhruvdcoder
- Created: 2025-10-04T01:29:20.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2026-02-13T23:45:26.000Z (4 months ago)
- Last Synced: 2026-02-14T05:31:35.741Z (4 months ago)
- Topics: diffusion-models, non-autoregressive-generation, small-language-models
- Language: Python
- Homepage:
- Size: 6.15 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
A Unified Framework for Non-Autoregressive Language Models
---
XLM is a **modular, research-friendly framework** for developing and comparing non-autoregressive language models. Built on PyTorch and PyTorch Lightning, with Hydra for configuration management, XLM makes it effortless to experiment with cutting-edge NAR architectures.
## β¨ Key Features
| Feature | Description |
|-------------------------------|---------------------------------------------------------------------------------------|
| π§© **Modular Design** | Plug-and-play componentsβswap models, losses, predictors, and collators independently |
| β‘ **Lightning-Powered** | Distributed training, mixed precision, and logging out of the box |
| ποΈ **Hydra Configs** | Hierarchical configuration with runtime overridesβno code changes needed |
| π¦ **Multiple Architectures** | 7 NAR model families ready to use |
| π¬ **Research-First** | Type-safe with `jaxtyping`, debug modes, and flexible metric injection |
| π€ **Hub Integration** | Push trained models directly to Hugging Face Hub |
## ποΈ Available Models
| Model | Full Name | Description |
|--------|--------------------------|--------------------------------------|
| `mlm` | Masked Language Model | Classic BERT-style masked prediction |
| `ilm` | Insertion Language Model | Iterative insertion-based generation |
| `arlm` | Autoregressive LM | Standard left-to-right baseline |
| `mdlm` | Masked Diffusion LM | Discrete diffusion with masking |
| `idlm` | Diffusion Insertion LM | Multi-token insertion diffusion |
## π Installation
```bash
pip install xlm-core
```
For model implementations, also install:
```bash
pip install xlm-models
```
## π Quick Start
XLM uses a simple CLI with three main arguments:
```bash
xlm job_type= job_name= experiment=
```
| Argument | Description |
|--------------|-------------------------------------------------------|
| `job_type` | One of `prepare_data`, `train`, `eval`, or `generate` |
| `job_name` | A descriptive name for your run |
| `experiment` | Path to your Hydra experiment config |
## π― Example: ILM on LM1B
A complete workflow demonstrating the Insertion Language Model on the LM1B dataset:
### 1οΈβ£ Prepare Data
```bash
xlm job_type=prepare_data job_name=lm1b_prepare experiment=lm1b_ilm
```
### 2οΈβ£ Train
```bash
# Quick debug run (overfit a single batch)
xlm job_type=train job_name=lm1b_ilm experiment=lm1b_ilm debug=overfit
# Full training
xlm job_type=train job_name=lm1b_ilm experiment=lm1b_ilm
```
### 3οΈβ£ Evaluate
```bash
xlm job_type=eval job_name=lm1b_ilm experiment=lm1b_ilm \
+eval.ckpt_path=
```
### 4οΈβ£ Generate
```bash
xlm job_type=generate job_name=lm1b_ilm experiment=lm1b_ilm \
+generation.ckpt_path=
```
**Tip:** Add `debug=[overfit,print_predictions]` to print generated samples to the console:
```bash
xlm job_type=generate job_name=lm1b_ilm experiment=lm1b_ilm \
+generation.ckpt_path= \
debug=[overfit,print_predictions]
```
### 5οΈβ£ Push to Hugging Face Hub
```bash
xlm job_type=push_to_hub job_name=lm1b_ilm_hub experiment=lm1b_ilm \
+hub_checkpoint_path= \
+hub.repo_id=
```
## ποΈ Project Structure
```
xlm-core/
βββ src/xlm/ # Core framework
β βββ harness.py # PyTorch Lightning module
β βββ datamodule.py # Data loading & collation
β βββ metrics.py # Evaluation metrics
β βββ configs/ # Default Hydra configs
β
βββ xlm-models/ # Model implementations
βββ mlm/ # Masked LM
βββ ilm/ # Infilling LM
βββ arlm/ # Autoregressive LM
βββ ... # Other architectures
```
## π§ Extending XLM
Adding a new model requires implementing four components:
| Component | Responsibility |
|---------------|-----------------------------|
| **Model** | Neural network architecture |
| **Loss** | Training objective |
| **Predictor** | Inference/generation logic |
| **Collator** | Batch preparation |
You can also add new entrypoint scripts to the cli.
See the [Contributing Guide](./wiki/CONTRIBUTING.md) for a complete walkthrough.
## π Documentation
- [Data Pipeline](./wiki/datapipeline.md) β How data flows through XLM
- [Training Scripts](./wiki/scripts/training.md) β Advanced training options
- [Generation](./wiki/scripts/generation.md) β Decoding strategies and parameters
- [External Models](./wiki/EXTERNAL_MODELS.md) β Using pretrained weights
## π€ Contributing
We welcome model contributions! Please check out our [Contributing Guide](./wiki/CONTRIBUTING.md) for guidelines on adding new models and features.
## π License
This project is licensed under the MIT License.
## π Acknowledgements
XLM is developed and maintained by [IESL](https://iesl.cs.umass.edu/) students at UMass Amherst.
**Primary Developers:**
1. [Dhruvesh Patel](https://dhruveshp.com)
2. [Durga Prasad Maram](https://github.com/Durga-Prasad1)
3. [Sai Sreenivas Chintha](https://github.com/sensai99)
4. [Benjamin Rozonoyer](https://brozonoyer.github.io/)
**Model Contributors:**
1. Soumitra Das (EditFlow)
2. Eric Chen (EditFlow)
---
Built with β€οΈ for the NLP research community