{"id":18439419,"url":"https://github.com/idiap/sigma-gpt","last_synced_at":"2025-11-04T11:30:39.253Z","repository":{"id":245460464,"uuid":"818252383","full_name":"idiap/sigma-gpt","owner":"idiap","description":"σ-GPT: A New Approach to Autoregressive Models","archived":false,"fork":false,"pushed_at":"2024-08-14T09:06:41.000Z","size":1637,"stargazers_count":61,"open_issues_count":1,"forks_count":9,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-08T10:51:13.676Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idiap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSES/AGPL-3.0-only.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-21T12:34:28.000Z","updated_at":"2024-12-07T15:04:55.000Z","dependencies_parsed_at":"2024-08-14T10:29:09.894Z","dependency_job_id":"e25b36b1-48ea-48b7-8b5a-f278f2f69a88","html_url":"https://github.com/idiap/sigma-gpt","commit_stats":null,"previous_names":["idiap/sigma-gpt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fsigma-gpt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fsigma-gpt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fsigma-gpt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fsigma-gpt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idiap","download_url":"https://codeload.github.com/idiap/sigma-gpt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239433901,"owners_count":19637806,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T06:24:43.062Z","updated_at":"2025-11-04T11:30:39.167Z","avatar_url":"https://github.com/idiap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\nCopyright © 2024 Idiap Research Institute \u003ccontact@idiap.ch\u003e\n\nSPDX-FileContributor: Arnaud Pannatier \u003carnaud.pannatier@idiap.ch\u003e\n\nSPDX-License-Identifier: AGPL-3.0-only\n--\u003e\n\n# $\\sigma$-GPT: A New Approach to Autoregressive Models\n\n\u003cdiv align='center'\u003e\n  Burst-sampling from a character-level σ-GPT model trained on Shakespeare.\n  \u003cimg alt=\"Burst-sampling from a character-level sigma-GPT\" src=\"media/console.gif\"\u003e\n\u003c/div\u003e\n\n\n## Project Overview\n\nWelcome to the $\\sigma$-GPT project! This repository houses the implementation of $\\sigma$-GPT, a model capable of generating signals in any direction. It includes a burst-sampling scheme that generates sequences in sublinear time. The methods and experiments here correspond to those presented in our paper, *$\\sigma$-GPT: A New Approach to Autoregressive Models*, accepted at ECML/PKDD 2024.\n\n[![Twitter Thread](https://img.shields.io/badge/Thread-000000?style=for-the-badge\u0026logo=X\u0026logoColor=white)](https://x.com/ArnaudPannatier/status/1799055129829839166)\n[![arXiv](https://img.shields.io/badge/arXiv-2404.09562-b31b1b?style=for-the-badge\u0026logo=arXiv\u0026logoColor=white)](https://arxiv.org/abs/2404.09562)\n![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge\u0026logo=python\u0026logoColor=ffdd54)\n![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge\u0026logo=PyTorch\u0026logoColor=white)\n\n## Credits\n\nThis repository is based on the following codebases:\n\n- Picoclvr [https://fleuret.org/git/picoclvr](https://fleuret.org/git/picoclvr)\n- NanoGPT [https://github.com/karpathy/nanoGPT](https://github.com/karpathy/nanoGPT)\n  - With an additionnal KV cache implementation from Vincent Micheli and Eloi Alonso, from the IRIS codebase [https://github.com/eloialonso/iris](https://github.com/eloialonso/iris)\n\n## One-Liner\n\nDetailed instructions are provided below, but here are some one-liners to get you started:\nIt should work given that the environment is set up correctly.\nMore details are provided in the following sections.\n\n**Install and text modeling on CPU**:\n```bash\ngit clone git@github.com:idiap/sigma-gpt.git\ncd sigma-gpt\ngit submodule update --init --recursive\ncd text\n(cd nanoGPT/data/shakespeare_char/; python prepare.py)\npython train.py nanoGPT/config/train_shakespeare_char.py --max_iters=20000 --device=cpu\n```\n\nThen you can evaluate the model with:\n```bash\npython sample.py nanoGPT/config/train_shakespeare_char.py --device=cpu --max_tokens=255 --verbose=True\n```\n(remove the `--device=cpu` if you have a GPU, it should work with `mps` on recent Mac as well)\n\n\n## Installation\n\nThe implementation is adapted from two well-known codebases: `picoclvr` from François Fleuret and `nanoGPT` from Andrej Karpathy. We organized the code so that the two codebases are imported as submodules.\n\nFirst, clone the repository:\n\n```bash\ngit clone ...\n```\n\nThen set up the submodules:\n\n```bash\ngit submodule update --init --recursive\n```\n\n\nThe environment.yml file contains all the required dependencies. You should be able to create a working environment by running this command in the main folder:\n\n```bash\nconda env create -n sigma-gpt -f environment.yml\n```\n\n## Usage\n\n### Text modelling\n\nFor testing the model with fast training and evaluation on CPU, you can use the Shakespeare dataset. The model is trained on the character level.\n\nFirst, go to the text folder:\n\n```bash\ncd text\n```\n\nAnd prepare the dataset:\n```bash\n(cd nanoGPT/data/shakespeare_char/; python prepare.py)\n```\n\nThen you can train with (remove the `--device=cpu` if you have a GPU, it should work with `mps` on recent Mac as well)\n\n\n```bash\npython train.py nanoGPT/config/train_shakespeare_char.py --device=cpu\n```\nEvaluation:\n```bash\npython sample.py nanoGPT/config/train_shakespeare_char.py --device=cpu --max_tokens=255 --verbose=True\n```\nwith verbose set to `True` you can see the generation on the terminal.\n\n\nFor larger pipelines, you can try:\n\nTraining:\n```bash\npython train.py\n```\n\n```bash\npython sample.py\n```\n\n\n### Non-NLP tasks\nThese commands train the model and output results for each epoch.\n`main.py` contains many arguments that can be adapted to change the dataset, model size, number of layers, results folder, etc.\n\nFirst, go to the non-nlp folder:\n\n```bash\ncd non-nlp\n```\n**Maze**:\n```bash\npython main.py --task maze --training_strategy=\"shuffle\"\n```\n\n**Vertical**:\n```bash\npython main.py --task vertical --training_strategy=\"shuffle\"\n```\n\nN.B: the air traffic dataset is not publicly available yet, the procedure is ongoing, and the dataset will be available soon. Reach out to the author for more information.\nOnce it is available it will be linked here.\n\n## Minimal GPU specs\nThis repository contains rather small models, running without problem in a few hours on modest GPUs.\n\n## License\n\nThis software is distributed under the LGPL-3.0 license. See the LICENSE file for more details.\n\n## Project status\n\nThis code was developed as a part of the Innosuisse MALAT: Machine Learning for Air Traffic project, which is a partnership between SkySoft ATM and the Idiap Research Institute.\n\nMain research partner: Pr. François Fleuret (UNIGE)\n\nProject manager: Didier Berling (SkySoft ATM)\n\nAuthor: Arnaud Pannatier \u003carnaud.pannatier@idiap.ch\u003e (Idiap Research Institute).\n\nFor any questions/remarks about this work or this research, feel free to contact the author.\n\n## Citation\n\nIf you use this code in your research, please cite the following paper:\n```bibtex\n@misc{pannatier2024sigmagpts,\n      title={{\\sigma}-GPTs: A New Approach to Autoregressive Models},\n      author={Arnaud Pannatier and Evann Courdier and François Fleuret},\n      year={2024},\n      eprint={2404.09562},\n      archivePrefix={arXiv},\n      primaryClass={id='cs.LG'}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fsigma-gpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidiap%2Fsigma-gpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fsigma-gpt/lists"}