https://github.com/2003harsh/transformer-based-decoder-only-language-model

This repository explores building a character-level transformer decoder in PyTorch, similar to GPT while focusing more on understanding individual components. My goal is to gain deep transformer knowledge and see if character-level learning improves handling of unseen words. The code allows for hyperparameter tuning and experiment customization.
https://github.com/2003harsh/transformer-based-decoder-only-language-model

from-scratch-in-python gpt language-model pytorch transformers

Last synced: 12 months ago
JSON representation

Host: GitHub
URL: https://github.com/2003harsh/transformer-based-decoder-only-language-model
Owner: 2003HARSH
License: mit
Created: 2024-06-16T13:46:35.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-08-30T11:31:23.000Z (over 1 year ago)
Last Synced: 2025-01-11T09:47:47.911Z (about 1 year ago)
Topics: from-scratch-in-python, gpt, language-model, pytorch, transformers
Language: Jupyter Notebook
Homepage:
Size: 74.3 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Transformer-based (Decoder-only) Language Model from Scratch

This repository contains the implementation of a language model built from scratch using PyTorch. The model leverages a transformer-based (decoder-only) architecture, similar to GPT.

## Model Architecture

- **Model Parameters**: 14.26 million
- **Batch Size**: 256
- **Block Size**: 64
- **Max Iterations**: 10,000
- **Learning Rate**: 3e-4
- **Evaluation Iterations**: 100
- **Embedding Size**: 384
- **Number of Layers**: 12
- **Number of Attention Heads**: 8
- **Dropout Rate**: 0.2
- **Context Window**: 64 characters
- **Tokenizer**: Character-level
- **Vocublary Size**: 145
- **Training Data**: 2GB of open source textual data

## Why Build a Language Model from Scratch?

In an era where we can easily import state-of-the-art language models (LLMs) like GPT, BERT, or T5, you might wonder why build one from scratch. Here’s why:

1. **Deep Understanding**: Constructing a language model from scratch provides an in-depth understanding of the underlying mechanics of transformer architectures. This knowledge is crucial for debugging, fine-tuning, and optimizing models for specific tasks.
2. **Customization**: Off-the-shelf models are often generalized to handle a broad range of tasks. Building a custom model allows for tailored adjustments, ensuring optimal performance for specialized applications.
3. **Innovation**: By exploring the fundamental building blocks, we can experiment with novel architectures and techniques, potentially leading to innovative solutions and improvements beyond current state-of-the-art models.
4. **Educational Value**: The process of designing and implementing a model from scratch is an invaluable learning experience. It sharpens problem-solving skills, deepens expertise in machine learning, and enhances coding proficiency.
5. **Performance Optimization**: Understanding the intricacies of the model enables us to optimize its performance, memory usage, and speed, which is particularly beneficial for deploying models in resource-constrained environments.
6. **Transparency**: Building from scratch ensures complete transparency and control over the model's architecture, data handling, and training process, which is essential for developing ethical and trustworthy AI solutions.

## Getting Started

### Prerequisites

- Python 3.8+
- PyTorch 1.8.1+

### Installation

Clone the repository:

```bash
git clone https://github.com/2003HARSH/Transformer-based-Decoder-only-Language-Model-from-Scratch.git
cd Transformer-based-Decoder-only-Language-Model-from-Scratch
```

Install the required packages:

```bash
pip install -r requirements.txt
```

## Final Thoughts

Although this model may not yet match the capabilities of the leading LLMs out there, the experience has been incredibly informative. It's particularly rewarding to see the character-level tokenizer accurately form meaningful words—a significant achievement in itself. Given enough data and computational power, this model has the potential to perform exceptionally well.

Feel free to contribute to the project or use it as a learning tool for understanding transformer architectures and language model training from scratch.

## Contributing

We welcome contributions to improve the model, add new features, or fix bugs. Please open an issue or submit a pull request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/2003harsh/transformer-based-decoder-only-language-model

Awesome Lists containing this project

README