{"id":20958867,"url":"https://github.com/hayden-donnelly/gpt-tensorflow","last_synced_at":"2025-08-21T11:14:49.975Z","repository":{"id":176333353,"uuid":"615475483","full_name":"hayden-donnelly/gpt-tensorflow","owner":"hayden-donnelly","description":"Tensorflow implementation of the original GPT.","archived":false,"fork":false,"pushed_at":"2024-01-31T04:06:14.000Z","size":529,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-19T01:52:11.929Z","etag":null,"topics":["keras","natural-language-generation","natural-language-processing","tensorflow"],"latest_commit_sha":null,"homepage":"https://openai.com/research/language-unsupervised","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hayden-donnelly.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-17T19:33:32.000Z","updated_at":"2024-07-17T07:03:38.000Z","dependencies_parsed_at":"2024-11-19T01:50:12.779Z","dependency_job_id":"a0878cc9-a499-40e4-bebe-49499b12c1ee","html_url":"https://github.com/hayden-donnelly/gpt-tensorflow","commit_stats":null,"previous_names":["hayden-donnelly/gpt-tensorflow"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hayden-donnelly%2Fgpt-tensorflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hayden-donnelly%2Fgpt-tensorflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hayden-donnelly%2Fgpt-tensorflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hayden-donnelly%2Fgpt-tensorflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hayden-donnelly","download_url":"https://codeload.github.com/hayden-donnelly/gpt-tensorflow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234715649,"owners_count":18875908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["keras","natural-language-generation","natural-language-processing","tensorflow"],"created_at":"2024-11-19T01:50:05.558Z","updated_at":"2025-01-20T00:13:55.642Z","avatar_url":"https://github.com/hayden-donnelly.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gpt-tensorflow\nA Tensorflow implementation of the original GPT.\n\nThis code is based on my interpretation of the paper. You can find the authors' original code \n[here](https://github.com/openai/finetune-transformer-lm). Note that I have omitted downstream \nfinetuning in this implementation.\n\n\u003cimg src=\"./images/gpt-architecture.png\" width=\"270px\"\u003e\u003c/img\u003e\n\n## Script/Path Assumption\nAll scripts assume that you will be running them from the root of this repository and will produce errors if\nthey are not. For example, running ``python scripts/train.py`` from ``gpt-tensorflow`` will not produce errors,\nbut running ``python train.py`` from ``gpt-tensorflow/scripts`` will. The same goes for any paths you specify\nin script arguments. For example, ``data/tiny_shakespeare.txt`` is correct while \n``../data/tiny_shakespeare.txt`` is not.\n\n## Getting Started\n\n1. Build Docker image:\n```\nbash docker_build.sh\n```\n\n2. Start Docker container:\n```\nbash docker_run.sh\n```\n\n3. Tokenize text:\n```\npython scripts/tokenize.py --no_spacy --text_path data/tiny_shakespeare.txt\n```\n\n4. Train a model:\n```\npython scripts/train.py --num_attention_heads 6 --num_blocks 6 --feed_forward_dim 1024 --batch_size 1\n```\n\n5. Sample the trained model (work in progress):\n```\npython scripts/sample.py --prompt \"We know what we are, but know not what we may be.\"\n```\n\n## Tokenization\n\nThe ``tokenizer.py`` script offers two different tokenization schemes: spacy tokenization, and character-wise \ntokenization. Spacy tokenization is the default, and also the one used in the original paper. However, due \nto the large vocabulary size it generates, it may be too memory intensive for some machines. In this case, \ncharacter-wise tokenization can be used. To switch to character-wise tokenization, call ``tokenizer.py`` with \nthe argument ``--no_spacy``.\n\nExample: \n```\npython scripts/tokenizer.py --no_spacy\n```\n\nFull list of parameters:\n- ``--no_spacy`` : Use character wise tokenizer instead of Spacy tokenizer.\n- ``--text_path`` : Path to text file to be tokenized. Default: ``data/tiny_shakespeare.txt``\n\n## Training \n\nTo train the model, call ``scripts/train.py``. All of the model parameters will default to those \noutlined in the original paper, but you can override them by adding ``--\u003cparameter_name\u003e \u003cparameter_value\u003e`` \narguments when calling ``scripts/train.py``. \n\nExample: \n```\npython scripts/train.py --attention_dim 512\n```\n\nFull list of parameters:\n\n- ``--no_spacy`` : Use character wise tokenizer instead of Spacy tokenizer.\n- ``--num_blocks`` : Number of transformer blocks. Default: ``12``\n- ``--num_attention_heads`` : Number of attention heads in multi-head attention. Default: ``12``\n- ``--context_size`` : Number of tokens in each context. Default: ``512``\n- ``--attention_dim`` : Dimension of attention layers. Default: ``768``\n- ``--feed_forward_dim`` : Dimension of feed forward layers. Default: ``3072``\n- ``--activation`` : Activation function. Default: ``gelu``\n- ``--token_embed_dim`` : Dimension of token embeddings. Default: ``384``\n- ``--dropout`` : Dropout rate. Default: ``0.1``\n- ``--epochs`` : Number of epochs. Default: ``100``\n- ``--batch_size`` : Batch size. Default: ``64``\n\nFor training on a single consumer grade GPU, you'll need to nerf the model a bit. Below are two different configurations that work on an RTX 3070 Ti.\n\nNerfed batch size (slow training, but model parameters are the same as in the original paper):\n```\npython scripts/train.py --batch_size 1 --epochs 1\n```\n\nNerfed model (faster training, but model parameters are different from the original paper):\n```\npython scripts/train.py --num_attention_heads 6 --num_blocks 6 --feed_forward_dim 1024\n```\n\n## Docker Container\nBuilding image:\n```\nbash docker_build.sh\n```\n\nStarting container:\n```\nbash docker_run.sh\n```\n\n## Citation\n\n```bibtex\n@misc{radford_narasimhan_salimans_sutskever, \n    title={Improving Language Understanding by Generative Pre-Training}, \n    author={Alec Radford and Karthik Narasimhan and Tim Salimans and Ilya Sutskever},\n    url={https://openai.com/research/language-unsupervised}\n} \n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhayden-donnelly%2Fgpt-tensorflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhayden-donnelly%2Fgpt-tensorflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhayden-donnelly%2Fgpt-tensorflow/lists"}