https://github.com/suvash/taxophoney
GPT (Decoder only Transformer - from scratch) generated fake/phoney taxonomies (based on NCBI taxonomy dataset)
https://github.com/suvash/taxophoney
generative-model gpt ncbi-taxonomy taxonomy transformer transformer-decoder
Last synced: 4 months ago
JSON representation
GPT (Decoder only Transformer - from scratch) generated fake/phoney taxonomies (based on NCBI taxonomy dataset)
- Host: GitHub
- URL: https://github.com/suvash/taxophoney
- Owner: suvash
- License: apache-2.0
- Created: 2023-06-15T13:45:18.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-09-10T19:24:34.000Z (almost 2 years ago)
- Last Synced: 2025-01-02T12:34:00.768Z (6 months ago)
- Topics: generative-model, gpt, ncbi-taxonomy, taxonomy, transformer, transformer-decoder
- Language: Jupyter Notebook
- Homepage:
- Size: 64 MB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# taxophoney
GPT (Decoder only Transformer - from scratch) generated fake/phoney taxonomies, trained on NCBI taxonomy dataset, included in this repository.
## Requirements
- Pytorch - 1.12.1+cu116 (with CUDA support - for reasonably short training runs)
## Quick training results
```bash
$ python gpt.py
Using device : cuda
step 0: train loss 4.4625, val loss 4.4653
step 500: train loss 2.0843, val loss 2.1280
step 1000: train loss 1.5394, val loss 1.5920
step 1500: train loss 1.3097, val loss 1.3789
step 2000: train loss 1.1842, val loss 1.2741
step 2500: train loss 1.1017, val loss 1.2182
step 3000: train loss 1.0408, val loss 1.1938
step 3500: train loss 0.9831, val loss 1.1692
step 4000: train loss 0.9382, val loss 1.1591
step 4500: train loss 0.8935, val loss 1.1392
step 4999: train loss 0.8545, val loss 1.1383
```## Generated phoney taxonomy
The model training and sampling script can be used to train the model and generate(sample) a lot of names afterwards. Some of the names have been included in the [taxophoney.txt](taxophoney.txt) file included in the repo.
## Bonus : Generated images out of the phoney names
Naturally, some of these names makes one wonder what they could look like. I've used the [Stable Diffusion v1-5 Model by RunwayML](https://huggingface.co/runwayml/stable-diffusion-v1-5) to generate the images for some of the names. The generation prompt only includes the common name (inside the parens) and not the scientific names, since they didn't help with plausible images.
### Rhodarius leyi (Leyn's land weaker caterpillar)
### Oligops erythrotis (greater-cheeked of leaf-warbler)
### Ablenus amaratha (Golden-banded stone-eyellow bat)
### Chliostega sp. 'Nawatan (strawberry little emperor)
### Columbidium metulum (blotcheye columbing beetle)
### Gobionia rotalorum (round horned fringe-fingered gecko)
