https://github.com/muhammad-fiaz/gpt

A simple implementation based on the "Attention is All You Need" paper, using GPT-2 for text generation.
https://github.com/muhammad-fiaz/gpt

attention-is-all-you-need gpt gpt-2 gpt-3 gpt-implementation gpt-using-pytorch gpt2 numpy open-source paper-implementations python pytorch pytorch-implementation

Last synced: 4 months ago
JSON representation

A simple implementation based on the "Attention is All You Need" paper, using GPT-2 for text generation.

Host: GitHub
URL: https://github.com/muhammad-fiaz/gpt
Owner: muhammad-fiaz
License: mit
Created: 2023-12-21T15:59:20.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-08T12:50:44.000Z (6 months ago)
Last Synced: 2025-02-07T10:48:39.756Z (5 months ago)
Topics: attention-is-all-you-need, gpt, gpt-2, gpt-3, gpt-implementation, gpt-using-pytorch, gpt2, numpy, open-source, paper-implementations, python, pytorch, pytorch-implementation
Language: Python
Homepage:
Size: 104 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

        
GPT Implementation using Pytorch


A simple implementation repository based on the ["Attention is All You Need"](https://arxiv.org/pdf/1706.03762) paper, focused on learning and developing implementations of small-size models, using GPT-2 for text generation.

> **This project is currently in active development. Make sure to ⭐ the repository! If you want to contribute, make sure to fork the repository.**

## Table of Contents

1. [Requirements](#requirements)

2. [Usage](#usage)

3. [Arguments](#arguments)

   - [--text](#text-str)

   - [--nsamples](#nsamples-int)

   - [--batch_size](#batch_size-int)

   - [--length](#length-int)

   - [--temperature](#temperature-float)

   - [--top_k](#top_k-int)

   - [--quiet](#quiet-bool)

   - [--unconditional](#unconditional-bool)

   - [--seed](#seed-int-optional)

   - [--stop_token](#stop_token-str-optional)

   - [--skip_special_tokens](#skip_special_tokens-bool)

   - [--clean_up_tokenization_spaces](#clean_up_tokenization_spaces-bool)

   - [--do_sample](#do_sample-bool)

   - [--num_return_sequences](#num_return_sequences-int)

   - [--repetition_penalty](#repetition_penalty-float)

   - [--length_penalty](#length_penalty-float)

   - [--param](#param-str)

4. [Model Download](#model-download)

5. [Example](#example)

6. [License](#license)

7. [Weights License](#weights-license)

8. [Credits](#credits)

## Requirements

Before running the script, make sure you have the required dependencies installed:

```bash

pip install -r requirements.txt

```

You will also need to download the pre-trained GPT-2 model. The script will handle this for you if the model is not found locally.

## Usage

To generate text using GPT-2, use the following command:

```bash

python run.py --text "Your input text here" --nsamples 3 --batch_size 1 --length 50 --temperature 0.7 --top_k 40 --param 124M

```

## Arguments

- `--param` (str)

  Specifies the pre-trained model size to use, such as `124M`, `355M`, `774M`, or `1558M`. Default is `124M`.

  ```bash

     --param 355M

  ```

- `--text` (str):  

  The input text that will be used as the starting point for text generation. For example:

  ```bash

  --text "Once upon a time"

  ```

- `--nsamples` (int):  

  Number of samples (generated text sequences) to produce. Default is `1`.

  ```bash

  --nsamples 3

  ```

- `--batch_size` (int):  

  Number of samples to generate per batch. This can be useful for generating multiple texts in parallel. Default is `1`.

  ```bash

  --batch_size 1

  ```

- `--length` (int):  

  The length of each generated text sequence in tokens. The default is half of the model’s maximum context size.

  ```bash

  --length 50

  ```

- `--temperature` (float):  

  Controls the randomness of predictions by scaling the logits before applying softmax. A higher temperature (e.g., 1.0) results in more random completions, and lower temperature values (e.g., 0.7) make the model more confident in its predictions.

  ```bash

  --temperature 0.7

  ```

- `--top_k` (int):  

  Limits the sampling to the top `k` most likely next words. The higher the value, the more diverse the output. Lower values restrict the model to fewer possible next words.

  ```bash

  --top_k 40

  ```

- `--quiet` (bool):  

  If set to `True`, suppresses output (e.g., the generated text). Default is `False`.

- `--unconditional` (bool):  

  If set to `True`, generates text without a prompt. Otherwise, it will generate text based on the provided input `--text`.

- `--seed` (int, optional):  

  A random seed for generating reproducible results. If not provided, a random seed will be used each time.

  ```bash

  --seed 42

  ```

- `--stop_token` (str, optional):  

  A token at which text generation should stop (e.g., a special end-of-text token). Default is `None`.

  ```bash

  --stop_token "<|endoftext|>"

  ```

- `--skip_special_tokens` (bool):  

  If set to `True`, skips special tokens (e.g., end-of-text markers) in the generated text. Default is `False`.

  ```bash

  --skip_special_tokens True

  ```

- `--clean_up_tokenization_spaces` (bool):  

  If set to `True`, it removes unwanted spaces generated by the tokenizer. Default is `False`.

  ```bash

  --clean_up_tokenization_spaces True

  ```

- `--do_sample` (bool):  

  If set to `True`, generates text using sampling. Otherwise, it will generate the most likely output using greedy decoding. Default is `True`.

  ```bash

  --do_sample True

  ```

- `--num_return_sequences` (int):  

  Number of distinct sequences to generate for each input prompt. Default is `1`.

  ```bash

  --num_return_sequences 3

  ```

- `--repetition_penalty` (float):  

  A penalty for repetition in the generated text. Higher values reduce repetition. Default is `1.0`.

  ```bash

  --repetition_penalty 1.2

  ```

- `--length_penalty` (float):  

  A penalty for the length of the generated sequence. Higher values favor shorter sequences. Default is `1.0`.

  ```bash

  --length_penalty 1.0

  ```

## Model Download

This script will attempt to download the pre-trained GPT-2 model automatically if it is not already available locally. The model is downloaded from Hugging Face's repository.

## Example

To generate three text samples with the following parameters:

- Input text: "Once upon a time"

- Number of samples: 3

- Batch size: 1

- Length of generated text: 50 tokens

- Temperature: 0.7

- Top-k sampling: 40

Run the command:

```bash

python run.py --text "Once upon a time" --nsamples 3 --batch_size 1 --length 50 --temperature 0.7 --top_k 40 --param 124M

```

## License

This project is licensed under the [MIT License](./LICENSE).

## Weights License

The pre-trained GPT-2 model Weight is released under the [GPT-2 License](https://huggingface.co/openai-community/gpt2/tree/main) on Huggingface, and also you can refer on [GitHub](https://github.com/openai/gpt-2/blob/master/LICENSE), which allows for research and commercial use but with certain restrictions regarding model usage and the generation of harmful content.

## Credits

This implementation is inspired by and based on the work from the [GPT-2 repository](https://github.com/openai/gpt-2), which provides the foundational model and techniques for text generation. We have built upon these concepts to create this PyTorch-based version.

Additionally, this implementation is also based on the "Attention is All You Need" paper, which introduced the Transformer architecture that serves as the foundation for GPT models.

You can access the paper here: [Attention is All You Need](https://arxiv.org/abs/1706.03762).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/muhammad-fiaz/gpt

Awesome Lists containing this project

README

GPT Implementation using Pytorch