https://github.com/oelin/mlp-language-model
A 2M parameter neural language model trained on the TinyStories corpus.
https://github.com/oelin/mlp-language-model
language-model nlp
Last synced: 3 months ago
JSON representation
A 2M parameter neural language model trained on the TinyStories corpus.
- Host: GitHub
- URL: https://github.com/oelin/mlp-language-model
- Owner: oelin
- License: mit
- Created: 2023-10-03T10:15:04.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-03T16:06:45.000Z (over 1 year ago)
- Last Synced: 2024-01-27T17:41:39.240Z (over 1 year ago)
- Topics: language-model, nlp
- Language: Python
- Homepage:
- Size: 35.5 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MLP Language Model
A 2M parameter neural language model trained on the TinyStories corpus.
## Completions
Prompt: ``
> Once upon a time there was a little girl named Lily. She was very happy and she had a lot of fun with her friends. One day, Lily went to visit the kitchen. She was so happy that she didn't know what to do.
Prompt: ``
> Once there was a little boy named Timmy. Timmy loved to play outside with his friends. One day, he went to the park to play. Jack was playing in the park with his friends. He saw a big red car and his toy car. Timmy was happy to have the toy car.
Prompt: `When Alice saw Eve she said`
> When Alice saw Eve she said it was time to go home. She said goodbye to her mom and dad.
Prompt: `One day, the sun`
> One day, the sun was in the sky. It was a big, beautiful sky. The sun had many colors of the animals in the garden. The animals was so happy and thanked the rabbit for his own new place.
## Architecture
```
MLPLM(
(embedding): Embedding(10003, 256, padding_idx=10002)
(mlp): Sequential(
(0): Linear(in_features=16384, out_features=128, bias=True)
(1): ReLU()
(2): Dropout(p=0.05, inplace=False)
(3): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
(4): Linear(in_features=128, out_features=64, bias=True)
(5): ReLU()
(6): Dropout(p=0.05, inplace=False)
(7): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
(8): Linear(in_features=64, out_features=10003, bias=True)
(9): LogSoftmax(dim=-1)
)
)
```## Performance
MLPLM-V3.
| # Sequences | NLL Loss (Train) | NLL Loss (Validation) |
|-------------|------------------|-----------------------|
| 0 | 8.94 | 8.93 |
| 1000 | 5.71 | 5.66 |
| 10000 | 4.29 | 4.28 |
| 20000 | 3.91 | 3.89 |
| 30000 | 3.88 | 3.86 |