An open API service indexing awesome lists of open source software.

https://github.com/adrianocleao/transformers-from-scratch

This repository is dedicated to reconstructing the Transformers architecture from the ground up using PyTorch. Based on the model presented in the "Attention is All You Need" paper, this project aims to better understand the architecture of one of the most important advancements in NLP programming.
https://github.com/adrianocleao/transformers-from-scratch

artificial-intelligence attention-is-all-you-need machine-learning neural-network nlp pythorch transformers

Last synced: 3 months ago
JSON representation

This repository is dedicated to reconstructing the Transformers architecture from the ground up using PyTorch. Based on the model presented in the "Attention is All You Need" paper, this project aims to better understand the architecture of one of the most important advancements in NLP programming.

Awesome Lists containing this project

README

        

# Transformer from Scratch with PyTorch
transformer

## Overview
This project implements an English-to-Portuguese translation system using a Transformer model built from scratch. The model was developed based on the paper "Attention Is All You Need" (Vaswani et al., 2017), which introduced the Transformer architecture for translation and other natural language processing tasks.

## Project Description

The goal of this project is to build a machine translation model that can accurately translate English texts into Portuguese. The model was implemented from scratch without using pre-built Transformer libraries and is trained using the **Helsinki-NLP/opus_books** dataset available on Hugging Face Datasets.

## Dataset

The dataset used is [Helsinki-NLP/opus_books](https://huggingface.co/datasets/Helsinki-NLP/opus_books/viewer/en-pt), which contains a collection of books translated into English and Portuguese. This dataset is ideal for training translation models as it provides sentence pairs in both languages.

## Model Architecture
The Transformer model consists of:

- Encoder: Encodes the input sequence into an internal representation.
- Decoder: Decodes the internal representation to generate the output sequence.
- Attention Layers: Used to capture relationships between different parts of the input and output.

Training is performed using cross-entropy loss, and the model is optimized with the Adam optimizer.

## Usage

1. Train the Model
Run the train.py script to train the model:

```bash
python .\train\train.py
```
The script will load the dataset, train the Transformer model, and save the model weights after each epoch.

2. Translate Text
After training, you can use the translate.py script to translate English text into Portuguese:

```bash
python translate.py "Your English text here"
```
If you do not provide text, the script will use a default example.