{"id":13798295,"url":"https://github.com/jayelm/gisting","last_synced_at":"2026-01-14T07:50:47.237Z","repository":{"id":154033285,"uuid":"629369483","full_name":"jayelm/gisting","owner":"jayelm","description":"Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467","archived":false,"fork":false,"pushed_at":"2025-02-14T15:11:51.000Z","size":16234,"stargazers_count":295,"open_issues_count":1,"forks_count":27,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-09-29T12:57:12.303Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jayelm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-04-18T07:13:59.000Z","updated_at":"2025-09-23T04:05:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"fc47f461-a619-444e-8780-859c190972b5","html_url":"https://github.com/jayelm/gisting","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jayelm/gisting","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Fgisting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Fgisting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Fgisting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Fgisting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jayelm","download_url":"https://codeload.github.com/jayelm/gisting/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayelm%2Fgisting/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413510,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T00:00:41.460Z","updated_at":"2026-01-14T07:50:47.219Z","avatar_url":"https://github.com/jayelm.png","language":"Python","readme":"# Learning to Compress Prompts with Gist Tokens\n\nThis repository contains code and data for \"Learning to Compress Prompts with Gist Tokens.\"\n\n## Setup\n\nThis codebase has been tested with python 3.9.16 and pytorch 2.0.0. I recommend creating a new virtual env (e.g. with Conda), then installing torch manually from `pytorch.org`. `pip install -r requirements.txt` should take care of the remaining dependencies.\n\n\u003e [!IMPORTANT]\n\u003e This codebase requires a specific version of Transformers: commit [fb366b9a](https://github.com/huggingface/transformers/tree/fb366b9a2a94b38171896f6ba9fb9ae8bffd77af). Installing from `requirements.txt` should install the correct version. **Newer Transformers releases will not work, as some function signatures have changed in `modeling_llama.py`** (see [this issue](https://github.com/jayelm/gisting/issues/10)).\n\n\u003e [!IMPORTANT]\n\u003e Training results are reproducible only with **DeepSpeed version 0.8.3.** For some (currently unknown) reason, newer DeepSpeed versions result in some performance degradation (see [this issue](https://github.com/jayelm/gisting/issues/9)).\n\n### Setup local directories\n\nBy default, experiment runs and model checkpoints are saved to `exp/` directory\nin root directory, and cached models (downloaded from the Huggingface Hub) and\ndatasets are saved to `.cache/`. Be sure to create these directories before\nrunning for the first time.\n\nI recommend either changing these directories in the config, or symlinking them\nto wherever you have plenty of space on your machine.\n\nLLaMA-7B experiments expect a folder called `llama-7b` in the root directory\nwith model weights and tokenizer.\n\n### Setup Weights \u0026 Biases (training only)\n\nIf you'd like to train models (not just use them), set up a [Weights \u0026 Biases account](https://wandb.ai/) for experiment logging, and replace my username with yours in the `wandb.entity` field of `src/conf/config.yaml` [here](https://github.com/jayelm/gisting/blob/main/src/conf/config.yaml#L22).\n\n## Demo + Checkpoints\n\nCheckpoints for the 1 token gist models for LLaMA-7B and FLAN-T5-XXL (as well as positive and negative controls) are now available on Hugging Face:\n\n- **LLaMA-7B**\n  - [Gist](https://huggingface.co/jayelm/llama-7b-gist-1)\n  - [Pos Control](https://huggingface.co/jayelm/llama-7b-pos_control-1)\n  - [Neg Control](https://huggingface.co/jayelm/llama-7b-neg_control-1)\n- **FLAN-T5-XXL**\n  - [Gist](https://huggingface.co/jayelm/flan-t5-xxl-gist-1)\n  - [Pos Control](https://huggingface.co/jayelm/flan-t5-xxl-pos_control-1)\n  - [Neg Control](https://huggingface.co/jayelm/flan-t5-xxl-neg_control-1)\n\n\u003e [!NOTE]\n\u003e The released LLaMA-7B checkpoints are **weight diffs**. You must have the base LLaMA-7B weights to recover the original model. Please use the `src/weight_diff.py` script to recover the original model given the weight diffs above, following the instructions [in the Alpaca repository](https://github.com/tatsu-lab/stanford_alpaca#recovering-alpaca-weights) (**but using my script instead**). Alternatively, if you use the `compress.py` script below and specify one of the Hugging Face diffs, the weight diff will be automatically applied for you if you supply `--base_llama_path`.\n\nTo use the model and try out gist caching, use the `src/compress.py` script, e.g.\n\n```\npython -m src.compress --model_name_or_path jayelm/llama-7b-gist-1 --base_llama_path llama-7b \\\n    --instruction \"Name the top cities in France that should not be missed. Include the best aspects of each place as well.\"\n```\n\nHere, `--instruction` is the prompt to be compressed and cached, and `--input` is an (optional) input you can supply that is not compressed.\n\n`compress.py` is well documented; use the `--help` flag for more details and browse the code to see how gist caching is done. If you're loading a FLAN-T5-XXL checkpoint, you do not need to supply `--base_llama_path`.\n\n\u003e [!NOTE]\n\u003e Gist compression is currently only supported for `batch_size = 1`. Larger batch sizes are mostly implemented in FLAN-T5-XXL, but I haven't checked correctness as carefully. For LLaMA-7B, larger batch sizes will require modifying the rotary position embedings to account for gist offsets [here](https://github.com/jayelm/gisting/blob/main/src/gist_llama.py#L115-L125).\n\n## Training\n\nIf you'd like to retrain the Gist models, the command\n\n```\npython -m src.train \\\n    training.gist.num_gist_tokens=2 training.gist.condition=gist wandb.tag=yourtaghere\n```\n\nTrains a small model (FLAN-T5-base) on the Alpaca+ training dataset with **2**\ngist tokens and **gist** masking, while logging to wandb.\n\nChange the number of gist tokens with `num_gist_tokens`. `condition` should be\nset to `gist`, `pos_control` (no masking), or `neg_control` (masking that simply\nmasks out the instruction entirely).\n\nFor debugging, you may be interested in setting the `+experiment=debug` flag, which runs a small model (FLAN-T5-small) on a tiny number of samples and evaluations, just to check that the train/eval loop is working.\n\n\u003e [!NOTE]\n\u003e If you're not familiar with the CLI syntax, check out [Hydra](https://hydra.cc/).\n\n\u003e [!NOTE]\n\u003e If you receive a `ConfigValueError`, see [this issue](https://github.com/jayelm/gisting/issues/6) for a workaround.\n\n\u003e [!NOTE]\n\u003e For VSCode users, some example launch configurations for debugging are in `.vscode/launch.json`.\n\nTo train the larger models in the paper (FLAN-T5-XXL, LLaMA-7B), multi-gpu\ntraining is required with DeepSpeed. `./run_deepspeed.sh` contains an example, but the basic idea is:\n\n```\ndeepspeed --num_gpus=4 --no_local_rank --module src.train \\\n    +model={flan-t5-xxl,llama-7b} \\\n    deepspeed=ds_configs/stage3.json \\\n    training.gist.num_gist_tokens 2 \\\n    training.gist.condition=gist\n```\n\nThis trains either `flan-t5-xxl`/`llama-7b` with the same gist configuration as the\nfirst flan-t5-base command above, using the hyperparameters in the paper. See\n`src/conf/{flan-t5-xxl,llama-7b}.yaml` for the hyperparameter configurations.\n\nThese experiments all assume a machine with 4 A100 80GB GPUs and at least 400GB\nof CPU RAM. Other machine configurations will necessitate changing the batch\nsize and/or deepspeed config setting.\n\nTake a look at other model configs in `src/conf/model`. In particular there's a\n`llama-debug.yaml` file which trains a small randomly initialized LLaMA model\nfor debugging.\n\n### Logging\n\nBe sure to set your `wandb` entity name correctly in `src/conf/config.yaml`, if it is not your username.\n\nBy default this logs an experiment to wandb under a group name that begins with `wandb.tag` (i.e. in the example above, `yourgroupname`); check out `src/conf/config.yaml` to see the full group name. Metrics are also logged to stdout, but there's a lot of other noise in stdout/stderr.\n\nThe main metrics to pay attention to in the wandb interface are `{seen,unseen,human}_{rouge1,rouge2,rougeL}`, which are the ROUGE scores for the seen/unseen/human splits, respectively.\n\nThe wandb group and run names define a directory which will save model checkpoints and outputs. By default it is `exp/{wandb.group}/{wandb.run}`. For example, if you run with the `+experiment=debug` setting, then the wandb group is set to `debug-alpaca-plus`. Saving model checkpoints is disabled in the debug config, but model outputs are nevertheless saved to `exp/debug-alpaca-plus/debug-alpaca-plus-run-42`. For example, `exp/debug-alpaca-plus/debug-alpaca-plus-run-42/outputs-100-validation_seen.csv` contains model outputs (greedy decode) on the seen split after 100 steps of training. These are useful for manual inspection, and also are the input for ChatGPT evaluation (see below).\n\n### Launching via SLURM\n\nUse `sbatch run_deepspeed.sh` to submit multi-GPU DeepSpeed jobs to a SLURM cluster, or just run in an interactive session.\n\nIf you are not using DeepSpeed, i.e. training smaller models (e.g. FLAN-T5-base, LLaMA-debug) on a single GPU, you can also use `hydra-submitit-launcher` to conveniently transform a local run into a SLURM batch job submission. Do `pip install hydra-submitit-launcher`, then specify `+launcher=slurm` via CLI to send a job to slurm.  Use of `-m` or `--multirun` as a Hydra option is required for the SLURM launcher to be picked up.  Configure slurm parameters (e.g. partition, account, etc) in `src/conf/launcher/slurm.yaml`.\n\nThis is particularly useful with hydra's sweep functionality. E.g. the command\n\n```\npython -m src.train -m +experiment=flan-t5-base wandb.tag=sweep-demo \\\n    training.gist.condition=gist,pos_control,neg_control\n```\n\nsubmits an array of 3 jobs to slurm, sweeping across the gist conditions.\n\n## ChatGPT Evaluation\n\nROUGE results are logged automatically during training above, but ChatGPT evaluation results need to be done manually.\n\nObtain filepaths to the predictions from two models you'd like to compare. Outputs used for evaluation in the paper are in `data/results/{FLAN-T5,LLaMA}-{gist,pos_control,neg_control}-{1,2,5,10}`, sweeping over the model, gist condition, and number of gist tokens respectively.\n\nFor example, say we want to compare the LLaMA single gist token model to its pos control. You can use the `scripts/eval_chatgpt.py` script:\n\n```\npython scripts/eval_chatgpt.py \\\n    --a data/results/model_outputs/LLaMA-gist-1/outputs-3000-validation_human.csv \\\n    --a_name LLaMA-gist-1 \\\n    --b data/results/model_outputs/LLaMA-pos_control-1/outputs-3000-validation_human.csv \\\n    --b_name LLaMA-pos-control-1 \\\n    --results_file my_comparison.json\n```\n\nYou will need a valid OpenAI key---follow OpenAI API setup instructions.\n\nOccasionally ChatGPT will spit out something that cannot be parsed by the JSON parser. In these cases it will log to stderr and the json for the result will have a \"COULD NOT PARSE JSON\" message. You can search for these messages in the results file, manually fix the responses, and change the overall scores accordingly.\n\nThe ChatGPT comparisons reported in the paper are located in `data/results/chatgpt`.\n\n## Benchmarking\n\nThe training script has benchmarking functionality which is used to obtain the benchmarking results.\n\nBenchmarking was done without DeepSpeed on a single A100 80GB GPU, though a 40GB GPU is likely fine too. An example command for benchmarking is (also available as a VSCode launch config):\n\n```\npython -m src.train \\\n    training.do_train=false \\\n    training.do_eval=true \\\n    training.do_benchmarking=true \\\n    training.do_benchmarking_sanity_checks=true \\\n    training.gist.num_gist_tokens=1 \\\n    training.gist.condition=gist \\\n    model.model_name_or_path=YOUR_PATH_TO_PRETRAINED_GIST_MODEL \\\n    training.benchmarking_profiler=pytorch \\\n    training.benchmarking_output_file=my_benchmarking.csv\n```\n\nSome notes here:\n\n- If you trained with the DeepSpeed config above, you will likely need to convert the DeepSpeed model checkpoint into a standard fp32 PyTorch model file by running `./zero_to_fp32.py . pytorch_model.bin` in the checkpoint you'd like to benchmark.\n- You can use either the [PyTorch default profiler](https://pytorch.org/docs/stable/profiler.html) or the [DeepSpeed FLOPs profiler](https://www.deepspeed.ai/tutorials/flops-profiler/) by setting `training.benchmarking_profiler`. Paper uses PyTorch default profiler.\n- `do_benchmarking_sanity_checks=true` activates gist caching sanity checking, where we verify that model outputs and decodes are same with and without gist caching.\n\n\u003e [!NOTE]\n\u003e For the larger models, we actually found that we would often fail gist sanity checks due to floating point errors. If you run the larger models with sanity checking on, you will find some torch assertion errors where 99% of the model states are identical, except for one value here or there.\n\n\u003e [!NOTE]\n\u003e We did not heavily optimize the gist caching implementation, so wall clock speedups (especially CPU times) are likely small or even non-existent due to the increased Python logic for gist caching. The main point of the gist caching implementation in this paper is to show it can be done and sanity check that the attention masking during training works for such caching behavior at inference time. The biggest gains from gist caching are likely to be achieved using custom, lower-level implementations of gist caching that optimize for inference latency.\n\nLike the other sections, the benchmarking results used in the paper are available in the `data/benchmarking` folder. See `data/README.md` for more details.\n\n## Data\n\nThe Alpaca+ data is located in `data/alpaca_plus`.  ChatGPT evaluations, raw model outputs, and benchmarking stats used for the paper are located in `data/results` and `data/benchmarking`.\n\n## License\n\nThe codebase is licensed Apache 2.0 (see `LICENSE`). The data is a mixture of\nSelf-Instruct (Apache 2.0) and Stanford Alpaca (CC BY-NC 4.0). By training on a\nmixture of the data you inherit both licenses.\n\n## Thanks\n\nTo the Stanford Alpaca team for assistance with the Alpaca data and finetuning.\n\n## Citation\n\nIf you found this work useful, please cite\n\n```bibtex\n@article{mu2023learning,\n    title={Learning to Compress Prompts with Gist Tokens}, \n    author={Jesse Mu and Xiang Lisa Li and Noah Goodman},\n    year={2023},\n    eprint={2304.08467},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL}\n}\n```\n","funding_links":[],"categories":["Language","Python"],"sub_categories":["2023"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjayelm%2Fgisting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjayelm%2Fgisting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjayelm%2Fgisting/lists"}