https://github.com/danny-dasilva/gpt2

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/danny-dasilva/gpt2
Owner: Danny-Dasilva
Created: 2020-07-29T14:35:56.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2023-05-22T22:18:12.000Z (about 2 years ago)
Last Synced: 2025-01-29T15:11:49.277Z (6 months ago)
Language: Python
Size: 3.59 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Docker Container for Training GPT-2

Only requirements are Nvidia docker
### clone repo

`git clone https://github.com/Danny-Dasilva/gpt2.git`

### path into repo

`cd gpt2`
### build docker container
`sudo docker build ./ -t gpt2`

### export current directory
`GPT_DIR=$PWD`

### run container local files are mounted in the local folder
```
docker run --name create-text \
--rm -it --privileged -p 6006:6006 \
--mount type=bind,src=${GPT_DIR},dst=/home/gpt2/local \
--gpus all \
gpt2
```

### second terminal is you need it

```
sudo docker exec -it create-text /bin/bash
```

## Train and Deploy model

### Add your text file
put your txt file in ${GPT_DIR} (the cloned repo)

the following examples will use the example file of `training.txt`

if you want examples on how to format the txt file that info is provided below

### Encode to training.npz

in the docker container

`python3 encode.py local/training.txt training.npz`

### Train Model
run name here is run1 this will be the same as the model name later

`python3 train.py --dataset training.npz --run_name run1`

the model will save automatically at 1000 steps but if you hit Ctrl + C it will save the most recent step, sometimes if you train for too short you will get a `ValueError("Can't load save_path when it is None.")` I think it has something to do with the tokens not delimiting correcly because the model is not trained enough.

### Run model
below is for unconditional e.g. generated text

`python3 generate_unconditional_samples.py --top_k 40 --model_name run1`

### Interactive samples
these will ask for a model prompt

`python3 interactive_conditional_samples.py --top_k 40 --model_name run1 --length 25`

### model args explanations

taken from https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f

* **top_k**: Integer value controlling diversity. 1 means only 1 word is considered for each step (token), resulting in deterministic completions, while 40 means 40 words are considered at each step. 0 (default) is a special setting meaning no restrictions. 40 generally is a good value.

* **temperature**: Float value controlling randomness in boltzmann distribution. Lower temperature results in less random completions. As the temperature approaches zero, the model will become deterministic and repetitive. Higher temperature results in more random completions. Default value is 1.
* **length**: number of characters included in each sample before a new sample is generated

## How do you format txt files?

the example said to delimit with `<|endoftext|>` but you can use whatever

files should look like below

```
Example sentence or paragraph of text
<|endoftext|>
Another text thing
<|endoftext|>
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/danny-dasilva/gpt2

Awesome Lists containing this project

README