Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eduardoleao052/lstm-from-scratch
Character-level LSTM from scratch, with clear backprop.
https://github.com/eduardoleao052/lstm-from-scratch
deep-learning machine-learning natural-language-processing
Last synced: 7 days ago
JSON representation
Character-level LSTM from scratch, with clear backprop.
- Host: GitHub
- URL: https://github.com/eduardoleao052/lstm-from-scratch
- Owner: eduardoleao052
- Created: 2023-05-24T20:09:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-10T19:50:01.000Z (7 months ago)
- Last Synced: 2024-04-10T22:52:26.682Z (7 months ago)
- Topics: deep-learning, machine-learning, natural-language-processing
- Language: Python
- Homepage:
- Size: 9.61 MB
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Educational LSTM From Scratch in Vanilla Python
- Use this repo to __train and test your own RNN and LSTM__.
- You can train and fine-tune a model on any text file, and it will generate text that sounds like it.
- The LSTM layers, __with full forward and backprop__, are in [layers_torch.py](layers_torch.py).## 1. Project Structure
- `numpy_implementations/` : Folder with model and every layer implemented from scratch using only numpy.- `data/` : Folder to store the text file. Currently holds shakespeare.txt (which is the default).
- `models/` : Folder which stores the saved models. Further explaination in section 2.
- `config.py` : File with all model configuration. Edit this file to alter model layers and hyperparameters.
- `torch_layers.py` : File containing every layer of the LSTM. Each layer is a class with a `.forward` and `.backward` method.
- `torch_model.py` : File with the `Model` class.
- `run.py` : Script ran by the `./run.sh` command. Trains the model.
- `utils.py` : File with helper functions and classes.
## 2. Running it Yourself
Requirements
- The required packages are listed on recquirements.txt. The numpy-based implementations of the layers are in the `numpy_implementations` folder in `layers.py` and `model.py`, and the torch implementation is on layers_torch.py and model_torch.py.
- The torch version is a little faster, and is the one used on the run.py implementation. The numpy files are listed for educational purposes only.
- To setup and join a miniconda virtual environment, run on terminal:
```
conda create -n environment_name python=3.8
conda activate environment_name
```
- The requirements can be installed on a virtual environment with the command
```
pip install -r requirements.txt
```
- To run, install the necessary requirements and a text corpus (any text you wish to replicate, .txt format).
- Please download your text file in the data directory.
Pretraining
- To pretrain a RNN on language modeling (predicting next character), first go into `config.py` and chose the necessary arguments.
- In the `training_params` dictionary, choose:
- `--corpus` (name of file in data directory with the text you want to train the model on)
- `--to_path` (.json file that will be created to store the model) [OPTIONAL]
- And you can choose the hyperparameters (although the defaults work pretty well):
- `n_iter` (number of times the model will run a full sequence during training)
- `n_timesteps` (number of characters the model will see/predict on each iteration in `n_iter`)
- `batch_size` (number of parallel iterations the model will run)
- `learning_rate` (scalar regulating how quickly model parameters change. Should be smaller for fine-tuning)
- `regularization`: (scalar regulating size of weights and overfitting) [OPTIONAL]
- `patience` (after how many iterations without improvement should the learning rate be reduced) [OPTIONAL]
- Under `model_layers`, you can choose whatever configuration works best. Usually, layers with more parameters require larger text files to avoid overfitting and repetitive outputs.
- Finally, simply run on terminal:
```
python3 run.py --train --config=config.py
```
- Whenever you feel like the samples are good enough, you can kill the training at any time. This will NOT corrupt the model saved .json file, and you may proceed to testing and fine_tuning on smaller datasets.
> **Note:** For pretraining, a really large text corpus is usually necessary. I obtained good results with ~1M characters. If you want to alter layers/dimensions, do so in the `config.py` file, as described in the __Build the Model__ section.
Fine-Tuning
- To fine-tune a RNN on a given text file, go to `config.py` and choose the arguments:
- In the `fine_tuning_params` dictionary, choose:
- `--corpus` (name of file in data directory with the text you want to train the model on)
- `--from_path` (.json file that contains pretrained model)
- `--to_path` (.json file that will be created to store the model) [OPTIONAL]
- And you can choose the hyperparameters (although the defaults work pretty well):
- `n_iter` (number of times the model will run a full sequence during training)
- `n_timesteps` (number of characters the model will see/predict on each iteration in `n_iter`)
- `batch_size` (number of parallel iterations the model will run)
- `learning_rate` (scalar regulating how quickly model parameters change)
- `regularization`: (scalar regulating size of weights and overfitting) [OPTIONAL]
- `patience` (after how many iterations without improvement should the learning rate be reduced) [OPTIONAL]
- `model_layers` will not be accessed during fine-tuning, as the layers of the pretrained model will be automatically loaded.
- Finally, simply run on terminal:
```
python3 run.py --fine_tune --config=config.py
```> **Note:** For fine-tuning, a you can get adventurous with smaller text files. I obtained really nice results with ~10K characters, such as a small Shakespeare dataset and Bee Gees' songs.
Testing
- To test your RNN, go to `config.py` and choose the arguments:
- In the `testing_params` dictionary, choose:
- `--from_path` (.json file that contains pretrained model)
- `--sample_size` (how many characters will be generated, "sounding" like the source text) [OPTIONAL]
- `--seed` (the start to the string your model generates, it has to "continue" it) [OPTIONAL]
> **Note:** the testing script does not access any hyperparametes, because the model is already trained.
- `model_layers` will not be accessed during testing, as you will use the layers of the pretrained model.- Finally, simply run on terminal:
```
python3 run.py --test --config=config.py
```
Build a custom Model
- To customize the model layers, go into `config.py` and edit the `model_layers` dictionary.
- Each layer takes as arguments the input and output sizes.
- You may chose among the following layers:
- `Embedding` (turns input indexes into vectors)
- `TemporalDense` (simple fully-connected layer)
- `RNN` (Recurrent Neural Network layer)
- `RNNBlock` (RNN + TemporalDense with residual connections)
- `LSTM` (Long Short Term Memory layer)
- `TemporalSoftmax` (returns probabilities for next generated character)
> **Note:** The first layer must be a `Embedding` layer with input size equals `vocab_size`. The last layer must be a `TemporalSoftmax` layer with the previous layer's output size equals `vocab_size`. The training is by default implemented to detect CUDA availability, and run on CUDA if found.## 3. Results
- The Recurrent Neural Network implementation in main.py achieved a loss of 1.42 with a 78 vocabulary size training on the tiny shakespeare corpus in `shakespeare.txt`.
```
CORIOLANUS:
I am the guilty of us, friar is too tate.QUEEN ELIZABETH:
You are! Marcius worsed with thy service, if nature all person, thy tear. My shame;
I will be deaths well; I say
Of day, who nay, embrace
The common on him;
To him life looks,
Yet so made thy breast,
From nightly:
Stand good.BENVOLIO:
Why, whom I come in his own share; so much for it;
For that O, they say they shall, for son that studies soul
Having done,
And this is the rest in this in a fellow.
```
> **Note:** Results achieved with the model configuration exactly as presented in this repo.
> The training took ~1h and 1500 steps.- The Long Short Term Memory (LSTM) implementation, using LSTMs instead of RNNs, achieved a loss of 1.32 with a 78 vocabulary size training on the tiny shakespeare corpus in `shakespeare.txt`.
```
HERMIONE:
Of all the sin of the hard heart; and hence,
For all the blessing from the king.QUEEN ELIZABETH:
Ah, that away?HERMIONE:
I'll go along.QUEEN ELIZABETH:
Thou wear'st out yourself, and indeed Edward,
and his hours' vent, O why, away.
```
> **Note:** Training times seemed to be a little faster with GPU (GTX 1070 vs M2 CPU), but the improvement was not dramatic (maybe due to iterative and non-paralellizeable nature of RNNs).
> The training took ~2h30 and 1500 steps.- Thanks for reading!