https://github.com/evilfreelancer/rugpt3-custom
Pre-training custom ruGPT3 model on books written by F.M. Dostoevski
https://github.com/evilfreelancer/rugpt3-custom
dataset dostoevsky gpt prediction rugpt training transformers
Last synced: 2 months ago
JSON representation
Pre-training custom ruGPT3 model on books written by F.M. Dostoevski
- Host: GitHub
- URL: https://github.com/evilfreelancer/rugpt3-custom
- Owner: EvilFreelancer
- License: mit
- Created: 2023-02-18T11:27:03.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-03T01:23:50.000Z (almost 3 years ago)
- Last Synced: 2025-07-26T11:40:40.059Z (11 months ago)
- Topics: dataset, dostoevsky, gpt, prediction, rugpt, training, transformers
- Language: Python
- Homepage: https://t.me/evilfreelancer
- Size: 4.13 MB
- Stars: 7
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tuned ruGPT3 on custom data
The following was used as initial data:
* Archive with digitized books by F.M. Dostoevsky
* Model ruGPT3small
The model was trained for five epochs, resulting in a model file of approximately 600 megabytes in size.
The specified file has been uploaded to the HuggingFace service and can be used locally for testing.
> Details here: https://dzen.ru/a/ZHTfs9pggmVlGC79 (on russian)
## Requirements
If you prefer the Docker way:
* Docker Engine
* Docker Compose
* Docker Nvidia Runtime
* CUDA 11.7
or if you prefer to install everything manually:
* Python 3.10
* CUDA 11.7
* NVCC
## How it was made
At the first step I've checked GitHub for projects in which was created custom
ruGPT3 model, which was trained on any text data
I've found [K7chyp/DostoevskyDoesntWriteIt](https://github.com/K7chyp/DostoevskyDoesntWriteIt) project, researched
sources and extracted commands, logic and prepared dataset with text.
Most important parts was copied to [train.sh](train.sh) and [prompt.sh](prompt.sh) scripts,
in general it was just a python scripts for executing pre-training and using pre-trained model, taken from original
ruGPT3 by [AI Forever](https://github.com/ai-forever/ru-gpts).
On next step I've tried to train own model with default parameters passed to `pretrain_transformers.py` and
found limitations of graphics card, 8Gb VRAM on my Nvidia RTX 3050 was not enough.
After several unsuccessful attempts, I managed to understand that changing the `block_size` parameter affects the amount
of memory used during model training. Therefore, I reduced it from 2048 to 512, after which the training was completed
without errors.
Next I've created Dockerfile and docker-compose.yml and project was done.
## How to install
Clone the repo, then switch working directory to sources root:
```shell
git clone --recursive git@github.com:EvilFreelancer/rugpt3-custom.git
cd rugpt3-custom
```
### The Doker way
Copy config:
```shell
cp docker-compose.dist.yml docker-compose.yml
```
Build and start:
```shell
docker-compose build
docker-compose up -d
```
Enter into container:
```shell
docker-compose exec app bash
```
### Manually
```shell
# Install packages
apt-get install -y software-properties-common curl build-essential git
# Install RUST
export PATH="~/.cargo/bin:${PATH}"
curl https://sh.rustup.rs -sSf | bash -s -- -y
# Install packages required for Apex
pip install packaging==23.0 torch==1.13.1+cu117 -f https://download.pytorch.org/whl/torch_stable.html
# Download and build Apex
export CUDA_HOME=/usr/local/cuda
git clone https://github.com/NVIDIA/apex.git
cd ./apex && git checkout 8b7a1ff183741dd8f9b87e7bafd04cfde99cea28 && cd ..
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex
# Install ru-gpts
git clone https://github.com/EvilFreelancer/ru-gpts.git ru_gpts
# Install other dependencies
pip install -r requirements.txt
# For ruGPT3XL need to use requirements-xl.txt file
pip install -r requirements-xl.txt
```
## How to train (optional)
First you need to create train and validation data from [output.csv](./data/output.csv), for this need to execute:
```shell
python3 prepare.py
```
Then execute following script:
```shell
./train.sh
```
And wait for a some time.
Training on my Nvidia RTX 3050 took about 35 minutes, GPU temp 64°С
## How to use
If you want to use your own model then exec following script:
```shell
./prompt.sh
```
But if you want to use my pretrained model uploaded to HuggingFace:
```shell
./prompt.hf.sh
```
After the model is loaded, you will see a command line prompt, just write a phrase and wait the result.
## Few examples
```
Москва, 19 июня /<18>69. <…> У меня, например, есть один приятель, очень умный человек, но которого я непонимаю. Он
говорит мне: –Знаете, Лев Николаич, я давно уже вас презирал, но вы, как человек умный, меня никогда не могли обидеть…
```
```
Однажды вечером, за обедом, я вдруг увидал, что у меня как будто все лицо изменяется: глаза смыкались, губы двигались;
нос тоже становился тоньше и суше, глаза сверкали и сверкали,– точно я что‑то предчувствовал и предугадывал. Я тотчас
же подошел к нему, поздоровался с ним, но он не ответил мне и только молча указал мне на стул, где я сидел. Я сел и
тотчас же опять начал его разглядывать. Он тотчас же потупил глаза и с минуту сидел неподвижно.
```
```
Меж тем он стал меня допрашивать. –Ну, что же?– сказал я ему,– что же? –А вот-с, что же-с!– отвечал он,– что же-с,
что ж? –А вот что, Марья Александровна, что ж?– сказал я, немного покраснев от гнева,– что ж, что же? что же? –Ах,
боже мой! Да ведь это все пустяки-с.
```
## Links
* https://dzen.ru/a/ZHTfs9pggmVlGC79
* https://huggingface.co/evilfreelancer/dostoevsky_doesnt_write_it
* https://github.com/K7chyp/DostoevskyDoesntWriteIt/
* https://github.com/ai-forever/ru-gpts
* https://github.com/GraphGrailAi/ruGPT3-ZhirV/