{"id":15829150,"url":"https://github.com/AdityaNG/kan-gpt","last_synced_at":"2025-10-16T21:31:32.530Z","repository":{"id":237711517,"uuid":"795106410","full_name":"AdityaNG/kan-gpt","owner":"AdityaNG","description":"The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling","archived":false,"fork":false,"pushed_at":"2024-11-25T00:23:32.000Z","size":3200,"stargazers_count":711,"open_issues_count":6,"forks_count":54,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-01-30T07:25:25.618Z","etag":null,"topics":["gpt","kanformers","kolmogorov-arnold-networks","kolmogorov-arnold-representation","llm","text-generation","transformers"],"latest_commit_sha":null,"homepage":"https://adityang.github.io/kan-gpt/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AdityaNG.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["AdityaNG"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2024-05-02T15:41:42.000Z","updated_at":"2025-01-16T10:05:19.000Z","dependencies_parsed_at":"2024-05-29T06:44:44.192Z","dependency_job_id":"00ee6736-38b3-48ba-9505-846a1bf91124","html_url":"https://github.com/AdityaNG/kan-gpt","commit_stats":null,"previous_names":["adityang/kan-gpt"],"tags_count":15,"template":false,"template_full_name":"rochacbruno/python-project-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaNG%2Fkan-gpt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaNG%2Fkan-gpt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaNG%2Fkan-gpt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaNG%2Fkan-gpt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AdityaNG","download_url":"https://codeload.github.com/AdityaNG/kan-gpt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236749064,"owners_count":19198617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gpt","kanformers","kolmogorov-arnold-networks","kolmogorov-arnold-representation","llm","text-generation","transformers"],"created_at":"2024-10-05T11:00:40.360Z","updated_at":"2025-10-16T21:31:27.229Z","avatar_url":"https://github.com/AdityaNG.png","language":"Python","funding_links":["https://github.com/sponsors/AdityaNG"],"categories":["Python","A01_文本生成_文本对话","Project"],"sub_categories":["其他_文本生成_文本对话","Alternative"],"readme":"# KAN-GPT\n\n![PyPI - Downloads](https://img.shields.io/pypi/dm/kan-gpt)\n[![PyPI - Version](https://img.shields.io/pypi/v/kan-gpt)](https://pypi.org/project/kan-gpt/)\n[![codecov](https://codecov.io/gh/AdityaNG/kan-gpt/branch/main/graph/badge.svg?token=kan-gpt_token_here)](https://codecov.io/gh/AdityaNG/kan-gpt)\n[![CI](https://github.com/AdityaNG/kan-gpt/actions/workflows/main.yml/badge.svg)](https://github.com/AdityaNG/kan-gpt/actions/workflows/main.yml)\n[![GitHub License](https://img.shields.io/github/license/AdityaNG/kan-gpt)](https://github.com/AdityaNG/kan-gpt/blob/main/LICENSE)\n\n\nThe PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling\n\n## Install it from PyPI\n\n```bash\npip install kan_gpt\n```\n\n## Citation\n\nIf you find our work useful cite us!\n\n```\n@misc{GANESH2024KANGPT,\n  author       = {Aditya Nalgunda Ganesh},\n  title        = {KAN-GPT: The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling},\n  year         = {2024},\n  month        = {May},\n  note         = {Release 1.0.0, 9th May 2024},\n  url          = {https://github.com/AdityaNG/kan-gpt/}\n}\n```\n\n## Usage\n\nRefer to the [KAN_GPT.ipynb](https://github.com/AdityaNG/kan-gpt/blob/main/KAN_GPT.ipynb) and [kan_gpt/prompt.py](https://github.com/AdityaNG/kan-gpt/blob/main/kan_gpt/prompt.py) for usage examples. The following is an outline of how to use the model:\n\n```py\nfrom kan_gpt.model import GPT\nfrom transformers import GPT2Tokenizer\n\nmodel_config = GPT.get_default_config()\nmodel_config.model_type = \"gpt2\"\nmodel_config.vocab_size = 50257\nmodel_config.block_size = 1024\nmodel = GPT(model_config)\n\ntokenizer = GPT2Tokenizer.from_pretrained('gpt2')\n\nprompt = \"Bangalore is often described as the \"\n\nprompt_encoded = tokenizer.encode(\n  text=prompt, add_special_tokens=False\n)\n\nx = torch.tensor(prompt_encoded).unsqueeze(0)\n\nmodel.eval()\ny = model.generate(x, 50)  # sample 50 tokens\n\nresult = tokenizer.decode(y[0])\n\nprint(result)\n\n# Bangalore is often described as the Silicon Valley of India.\n# The city has witnessed rapid growth in the past two decades.....\n```\n\n## Setup for Development\n\n```bash\n# Download Repo\ngit clone https://github.com/AdityaNG/kan-gpt\ncd kan-gpt\ngit pull\n\n# Download Dataset\npython3 -m kan_gpt.download_dataset --dataset tinyshakespeare\npython3 -m kan_gpt.download_dataset --dataset mnist\npython3 -m kan_gpt.download_dataset --dataset webtext\n\n# Install dependencies for development\npip install -r requirements.txt\npip install -e .\n```\n\n## Train\n\nUse the following dummy script to make sure everything is working as expected\n```bash\nWANDB_MODE=offline CUDA_VISIBLE_DEVICE=\"\" python3 -m kan_gpt.train --architecture MLP --batch_size 1 --dummy_dataset --device cpu --max_iters 200\nWANDB_MODE=offline CUDA_VISIBLE_DEVICE=\"\" python3 -m kan_gpt.train --architecture KAN --batch_size 1 --dummy_dataset --device cpu --max_iters 200\n```\n\nThen make use of the training script\n```bash\npython -m kan_gpt.train\n```\n\n## Prompt\n\nYou can prompt the model to produce text as follows\n```bash\npython -m kan_gpt.prompt --prompt \"Bangalore is often described as the \" --model_path (checkpoint)\n```\n\n## Results\n\nWe train and compare KAN-GPT with an equivalent MLP-GPT model on the Tiny Shakespeare dataset. We observe that the KAN-GPT performs slightly better than the MLP-GPT. We are looking into further experiments to dive deeper. The results are shown below:\n\n\n| Metrics |   |   |\n|---------|---------|---------|\n| ![results_loss](media/results_loss.png) | ![results_cross_entropy](media/results_cross_entropy.png) | ![results_perplexity](media/results_perplexity.png) |\n\n## TODOs\n\n- [x] Integrate [minGPT](https://github.com/karpathy/minGPT) and [pykan](https://github.com/KindXiaoming/pykan)\n- [x] Dataset downloading script for [WebText](https://github.com/openai/gpt-2-output-dataset)\n- [x] PyTorch Dataset parser for [WebText](https://github.com/openai/gpt-2-output-dataset)\n- [x] PyTorch Dataset parser for [tinyshakespeare](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt)\n- [x] Mini training POC for KAN-GPT\n  - [x] Integrate KAN training logic from `KAN.train_kan`\n  - [x] Train a dummy batch w/o any memory issues\n- [x] Mini training POC for MLP-GPT\n- [x] Train MLP-GPT on the webtext dataset as a baseline\n- [x] Train KAN-GPT on the webtext dataset as a baseline\n- [x] Metrics comparing KAN-GPT and MLP-GPT\n- [x] Auto Save checkpoints\n- [x] Auto Save checkpoints to W\u0026B\n- [ ] Auto Download model weights from git / huggingface\n- [x] W\u0026B hyperparam sweep script\n- [x] Script to load checkpoint in interactive mode\n- [ ] Reduce requrements.txt constraints\n- [ ] Define pydantic model for training and sweep args\n- [ ] Pruning the package, get rid of unused code\n- [ ] Training script to PyTorch Lighting\n- [x] Documentation: `mkdocs gh-deploy`\n- [x] Integrate with [efficient-kan](https://github.com/Blealtan/efficient-kan/blob/master/src/efficient_kan/kan.py)\n- [x] Test Cases\n  - [x] KAN: Forward-Backward test\n  - [x] GPT: Forward-Backward test\n  - [x] KAN_GPT: Forward-Backward test\n  - [x] EFFICIENT_KAN: Forward-Backward test\n\n## Development\n\nRead the [CONTRIBUTING.md](https://github.com/AdityaNG/kan-gpt/blob/main/CONTRIBUTING.md) file.\n\n## References\n\n- [minGPT](https://github.com/karpathy/minGPT)\n- [pykan](https://github.com/KindXiaoming/pykan)\n- [webtext](https://github.com/openai/gpt-2-output-dataset)\n- [tinyshakespeare](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAdityaNG%2Fkan-gpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAdityaNG%2Fkan-gpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAdityaNG%2Fkan-gpt/lists"}