https://github.com/adrianocleao/transformers-from-scratch
This repository is dedicated to reconstructing the Transformers architecture from the ground up using PyTorch. Based on the model presented in the "Attention is All You Need" paper, this project aims to better understand the architecture of one of the most important advancements in NLP programming.
https://github.com/adrianocleao/transformers-from-scratch
artificial-intelligence attention-is-all-you-need machine-learning neural-network nlp pythorch transformers
Last synced: 3 months ago
JSON representation
This repository is dedicated to reconstructing the Transformers architecture from the ground up using PyTorch. Based on the model presented in the "Attention is All You Need" paper, this project aims to better understand the architecture of one of the most important advancements in NLP programming.
- Host: GitHub
- URL: https://github.com/adrianocleao/transformers-from-scratch
- Owner: AdrianoCLeao
- Created: 2024-08-03T02:14:49.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-05T05:58:10.000Z (10 months ago)
- Last Synced: 2025-02-23T10:44:04.044Z (3 months ago)
- Topics: artificial-intelligence, attention-is-all-you-need, machine-learning, neural-network, nlp, pythorch, transformers
- Language: Python
- Homepage:
- Size: 26.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Transformer from Scratch with PyTorch
## Overview
This project implements an English-to-Portuguese translation system using a Transformer model built from scratch. The model was developed based on the paper "Attention Is All You Need" (Vaswani et al., 2017), which introduced the Transformer architecture for translation and other natural language processing tasks.## Project Description
The goal of this project is to build a machine translation model that can accurately translate English texts into Portuguese. The model was implemented from scratch without using pre-built Transformer libraries and is trained using the **Helsinki-NLP/opus_books** dataset available on Hugging Face Datasets.
## Dataset
The dataset used is [Helsinki-NLP/opus_books](https://huggingface.co/datasets/Helsinki-NLP/opus_books/viewer/en-pt), which contains a collection of books translated into English and Portuguese. This dataset is ideal for training translation models as it provides sentence pairs in both languages.
## Model Architecture
The Transformer model consists of:- Encoder: Encodes the input sequence into an internal representation.
- Decoder: Decodes the internal representation to generate the output sequence.
- Attention Layers: Used to capture relationships between different parts of the input and output.
Training is performed using cross-entropy loss, and the model is optimized with the Adam optimizer.## Usage
1. Train the Model
Run the train.py script to train the model:```bash
python .\train\train.py
```
The script will load the dataset, train the Transformer model, and save the model weights after each epoch.2. Translate Text
After training, you can use the translate.py script to translate English text into Portuguese:```bash
python translate.py "Your English text here"
```
If you do not provide text, the script will use a default example.