https://github.com/sbartlett97/torch-electra

A Custom implementation of the ELECTRA training method using PyTorch and HuggingFace Transformers
https://github.com/sbartlett97/torch-electra

machine-learning machine-learning-algorithms masked-image-modeling nlp nlp-machine-learning pretraining pretraining-bert python

Last synced: 12 months ago
JSON representation

A Custom implementation of the ELECTRA training method using PyTorch and HuggingFace Transformers

Host: GitHub
URL: https://github.com/sbartlett97/torch-electra
Owner: sbartlett97
License: mit
Created: 2024-11-29T18:04:46.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-22T21:09:08.000Z (over 1 year ago)
Last Synced: 2025-03-22T22:18:58.683Z (over 1 year ago)
Topics: machine-learning, machine-learning-algorithms, masked-image-modeling, nlp, nlp-machine-learning, pretraining, pretraining-bert, python
Language: Python
Homepage:
Size: 45.9 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# ELECTRA Training Implementation

A PyTorch-based implementation of the ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) pre-training method using the HuggingFace Transformers library. This implementation focuses on providing an easy-to-use and extensible framework for pre-training transformer models using the ELECTRA approach.

## Features

- 🚀 Easy-to-use training pipeline
- 🔧 Support for custom model configurations
- 📊 Training progress tracking with loss curves
- ⚡ Mixed precision training support
- 🔄 Gradient accumulation for handling large batch sizes
- 🎛️ Hyperparameter optimization using Optuna
- 💾 Automatic checkpointing
- 📈 Triangular learning rate scheduling (matching original paper)

## Installation

```bash
# Clone the repository
git clone https://github.com/yourusername/electra-implementation.git
cd electra-implementation

# Install dependencies
pip install -r requirements.txt
```

## Quick Start

Train a model using default settings (base ELECTRA configuration):
```bash
python main.py --run_name my_electra_model
```

### Training Options

Choose from different model sizes:
```bash
# Small ELECTRA
python main.py --preset small --run_name electra_small

# Base ELECTRA (default)
python main.py --preset base --run_name electra_base

# Large ELECTRA
python main.py --preset large --run_name electra_large
```

Customize training parameters:
```bash
python main.py \
--preset base \
--batch_size 32 \
--steps 1000000 \
--dataset_path "your/dataset/path" \
--run_name custom_electra
```

Run hyperparameter optimization:
```bash
python main.py --preset base --optuna
```

## Model Architecture

The implementation follows the original ELECTRA paper's architecture:

- **Generator & Discriminator**: Same number of layers but different widths
- **Model Configurations**:
- Small: 12-layer discriminator, 12-layer generator (generator has 1/3 the width)
- Base: 12-layer discriminator, 12-layer generator (generator has 1/3 the width)
- Large: 24-layer discriminator, 24-layer generator (generator has 1/4 the width)
- **Shared embeddings** between generator and discriminator

## Training Details

- Uses masked language modeling (MLM) for the generator
- Implements replaced token detection (RTD) for the discriminator
- Supports gradient accumulation for effective batch sizes
- Implements triangular learning rate schedule with warmup
- Uses mixed precision training for improved performance
- Tracks and saves training metrics

## Results and Metrics

Training progress can be monitored through:
- Real-time loss tracking in the console
- Generated loss curves (saved as `loss_curve.png`)
- Training logs (saved as `training_log.csv`)

## References

This implementation is based on the original ELECTRA paper and inspired by existing implementations:

- **ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators**
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
[ICLR 2020](https://openreview.net/pdf?id=r1xMH1BtvB)
```bibtex
@inproceedings{clark2020electra,
title = {{ELECTRA}: Pre-training Text Encoders as Discriminators Rather Than Generators},
author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},
booktitle = {ICLR},
year = {2020},
url = {https://openreview.net/pdf?id=r1xMH1BtvB}
}
```

- **PyTorch Implementation of ELECTRA**
Richard Wang
[GitHub Repository](https://github.com/richarddwang/electra_pytorch)
```bibtex
@misc{electra_pytorch,
author = {Richard Wang},
title = {PyTorch implementation of ELECTRA},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/richarddwang/electra_pytorch}}
}
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- HuggingFace team for their Transformers library
- PyTorch team and community
- Original ELECTRA paper authors

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sbartlett97/torch-electra

Awesome Lists containing this project

README