{"id":22410131,"url":"https://github.com/kssteven418/ltp","last_synced_at":"2025-09-03T05:44:19.914Z","repository":{"id":44442833,"uuid":"381890565","full_name":"kssteven418/LTP","owner":"kssteven418","description":"[KDD'22] Learned Token Pruning for Transformers ","archived":false,"fork":false,"pushed_at":"2023-02-27T21:34:41.000Z","size":42089,"stargazers_count":98,"open_issues_count":10,"forks_count":18,"subscribers_count":1,"default_branch":"ltp/main","last_synced_at":"2025-07-20T09:34:53.119Z","etag":null,"topics":["bert","efficient-model","efficient-neural-networks","model-compression","natural-language-processing","pruning","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2107.00910","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kssteven418.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-07-01T02:50:42.000Z","updated_at":"2025-06-02T15:09:11.000Z","dependencies_parsed_at":"2022-08-30T14:40:13.337Z","dependency_job_id":"fe04a1d3-90d1-45ad-9d98-f4808cd5189c","html_url":"https://github.com/kssteven418/LTP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kssteven418/LTP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FLTP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FLTP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FLTP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FLTP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kssteven418","download_url":"https://codeload.github.com/kssteven418/LTP/tar.gz/refs/heads/ltp/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FLTP/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273397843,"owners_count":25098234,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-03T02:00:09.631Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","efficient-model","efficient-neural-networks","model-compression","natural-language-processing","pruning","transformer"],"created_at":"2024-12-05T12:12:15.609Z","updated_at":"2025-09-03T05:44:19.891Z","avatar_url":"https://github.com/kssteven418.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LTP: Learned Token Pruning for Transformers\n\n![Screenshot from 2021-07-08 13-39-02](https://user-images.githubusercontent.com/50283958/124863552-376e7e80-dff2-11eb-8930-707664858adc.png)\n\n**Check our [paper](https://arxiv.org/abs/2107.00910) for more details.**\n\n# Installation\nWe follow the same installation procedure as the original [Huggingface transformer](https://github.com/huggingface/transformers) repo.\n```\npip install sklearn scipy datasets torch\npip install -e .  # in the top directory\n```\n\n# Prepare Checkpoints\nLTP is implemented on top of Huggingface transformer's [I-BERT implementation](https://github.com/huggingface/transformers/tree/master/src/transformers/models/ibert).\nTherefore, we first need to generate a checkpoint file of ibert finetuned on the target downstream task.\nWhile you can do this on the original Huggingface repository, \nwe also support our [base branch](https://github.com/kssteven418/LTP/tree/ltp/base) `ltp/base` where you can run the following code to finetune `ibert` on the GLUE tasks.\n\n```\ngit checkout ltp/base\ncd examples/text-classification\npython run_glue.py --model_name_or_path kssteven/ibert-roberta-base --output_dir {CKPT} --task {TASK} --do_train --do_eval {--some_more_arguments}\n```\n\n* `{TASK}`: RTE, MRPC, STSB, SST2, QNLI, QQP, MNLI\n* Please refer to the [Huggingface tutorial](https://huggingface.co/transformers/v2.3.0/examples.html) and the [official documentation](https://huggingface.co/transformers/main_classes/trainer.html#trainingarguments) for more details in arguments and hyperparameters.\n* Note that as default ibert behaves the same as roberta (see this [tutorial](https://huggingface.co/kssteven/ibert-roberta-base)), \nhence the resulting model will be the same as `roberta-base` finetuned on the target GLUE task.\n\nThe final model will be checkpointed in `{CKPT}`. \n* Remove `{CKPT}/trainer_state.json`.\n* In the configuration file `{CKPT}/config.json`, change (1) `\"architectures\"` to `[\"LTPForSequenceClassification\"]` and (2) `\"model_type\"` to `\"ltp\"`.\n\n\n# Run Learned Token Pruning\nAdd the following lines in the configuration file `{CKPT}/config.json`.\n```\n\"prune_mode\": \"absolute_threshold\",\n\"final_token_threshold\": 0.01, \n```\n\n`final_token_threshold` determines the token threshold of the last layer, and the thresholds of the remaining layers will be linearly scaled.\nFor instance, the thresholds for the 3rd, 6th, and 9th layers will be 0.0025, 0.005, and 0.0075, respectively, when setting the `final_token_threshold` , i.e., the threshold for the last (12th) layer, to 0.01.\nThis number is a hyperparameter, and we found that 0.01 works well in many cases.\n\nThe learnable mode consists of 2 stages: soft threshold and hard threshold.\nPlease refer to our [paper](https://arxiv.org/abs/2107.00910) for more details.\n\n## 1. Soft Threshold\nWe first train the model using the soft threshold mode. \nThis trains the thresholds as well as the model parameters to search for the best threshold configuration.\n\nRun the following command:\n```\npython run.py --arch ltp-base --task {TASK} --restore {CKPT} --lr 2e-5 --temperature {T}\\\n  --lambda 0.1 --weight_decay 0 --bs 64 --masking_mode soft --epoch {epoch} --save_step 100 --no_load\n```\n\n* `{TASK}`: RTE, MRPC, STSB, SST2, QNLI, QQP, MNLI\n* You can assign different learning rate for `lr`, but 2e-5 worked fine.\n* We set `{epoch}` to be 10 for smaller datasets (e.g., RTE, MRPC) and 1 for larger datasets (e.g., SST2, QNLI, MRPC).\n* `--no_load` flag will not load the best model at the end of the training (i.e., the final checkpoint will be the one at the end of training).\n* `lambda` is an important hyperparameter than controls the pruning level: the higher the value, the more we prune tokens. 0.01 ~ 0.2 worked well in many cases, but we recommend the user to empirically search for the best number for it.\n* `temperature` is another hyperparameter, and 1e-3 ~ 1e-5 worked well. In the paper, we searched over {1e−4, 2e−4, 5e−4, 1e−3, 2e−3}.\n\nThe final model will be checkpointed in `{CKPT_soft} = checkpoints/base/{TASK}/absolute_threshold/rate_{final_token_threshold}/temperature_{T}/lambda_{lambda}/lr_{lr}`.\nRemove `trainer_state.json` from the checkpoint file in `{CKPT_soft}`.\n\n\n## 2. Hard Threshold\nOnce we learn the thresholds, we fix those values, turn back to the hard threshold mode, and finetune the model parameters only.\n\nRun the following command:\n```\npython run.py --arch ltp-base --task {TASK} --restore {CKPT_soft} --lr {LR} --bs 64 --masking_mode hard --epoch 5 \n```\n\n* We used `{LR}` {0.5, 1, 2}e-5 in the paper.\n* You can additionally set `--save_step 500` for more frequent evaluation/logging. The default setting will evaluate for every 1 epoch.\n\nThe final model will be checkpointed in `{CKPT_soft}/hard/lr_{LR}`.\n\n\n# Run Baseline Methods\nWe additionally provide code to reproduce the baseline methods used in our paper (i.e., top-k and manual threshold).\n\n## Top-k Token Pruning\nAdd the following lines in `{CKPT}/config.json`.\n```\n\"prune_mode\": \"topk\",\n\"token_keep_rate\": 0.2,\n```\n\nThe token keep rates of the first three layers and the last layer are 1 and `token_keep_rate`, respectively. \nThe keep rates of the remaining layers are scaled linearly.\nThe smaller `token_keep_rate` is, the more aggressive we prune tokens.\nYou can also assign negative number for `token_keep_rate` and, in that case, the keep rate of each layer will be assigned as `max(0, keep_rate)`.\n\nRun the following command:\n\n```\npython run.py --arch ltp-base --task {TASK} --restore {CKPT} --lr {LR} --bs 64 --masking_mode hard --epoch 5\n```\n\n* We used `{LR}` {0.5, 1, 2}e-5 in the paper.\n* You can additionally set `--save_step 500` for more frequent evaluation/logging. The default setting will evaluate for every 1 epoch.\n\n\nThe final model will be checkpointed in `{CKPT}/topk/lr_{LR}`.\n\n\n## Manual (Non-learnable) Threshold Pruning\nAdd the following lines in `{CKPT}/config.json`.\n```\n\"prune_mode\": \"absolute_threshold\",\n\"final_token_threshold\": 0.01, \n```\n\nRun the following command:\n```\npython run.py --arch ltp-base --task {TASK} --restore {CKPT} --lr {LR} --bs 64 --masking_mode hard --epoch 5 --save_step 500\n```\n\n* We used `{LR}` {0.5, 1, 2}e-5 in the paper.\n* You can additionally set `--save_step 500` for more frequent evaluation/logging. The default setting will evaluate for every 1 epoch.\n* Note that the only difference from the learned token pruning mode is that we run the hard threshold mode from the beginning.\n\n\nThe final model will be checkpointed in `{CKPT}/hard/lr_{LR}`.\n\n\n# Copyright\n```\nTHIS SOFTWARE AND/OR DATA WAS DEPOSITED IN THE BAIR OPEN RESEARCH COMMONS REPOSITORY ON 02/07/23.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkssteven418%2Fltp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkssteven418%2Fltp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkssteven418%2Fltp/lists"}