{"id":19674041,"url":"https://github.com/frankaging/causal-distill-xxs","last_synced_at":"2026-03-15T11:34:35.323Z","repository":{"id":43099825,"uuid":"453576315","full_name":"frankaging/Causal-Distill-XXS","owner":"frankaging","description":"The Codebase for Causal Distillation for Task-Specific Models","archived":false,"fork":false,"pushed_at":"2022-11-19T23:40:14.000Z","size":2291,"stargazers_count":4,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-09T04:40:03.737Z","etag":null,"topics":["causality","language-model","model-distillation","natural-language-understanding"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2112.02505","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/frankaging.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-30T03:14:00.000Z","updated_at":"2022-02-23T04:59:23.000Z","dependencies_parsed_at":"2022-09-05T22:51:31.836Z","dependency_job_id":null,"html_url":"https://github.com/frankaging/Causal-Distill-XXS","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/frankaging/Causal-Distill-XXS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Distill-XXS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Distill-XXS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Distill-XXS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Distill-XXS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/frankaging","download_url":"https://codeload.github.com/frankaging/Causal-Distill-XXS/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Distill-XXS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30540982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-15T07:17:37.589Z","status":"ssl_error","status_checked_at":"2026-03-15T07:17:31.738Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causality","language-model","model-distillation","natural-language-understanding"],"created_at":"2024-11-11T17:17:03.082Z","updated_at":"2026-03-15T11:34:35.299Z","avatar_url":"https://github.com/frankaging.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Python 3.7](https://img.shields.io/badge/python-3.7-blueviolet.svg?style=plastic)\n![License CC BY-NC](https://img.shields.io/badge/license-MIT-05b502.svg?style=plastic)\n\n# Causal Distillation for Natural Language Understanding Tasks (DIITO-XXS)\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://i.ibb.co/Q8NNHPJ/Screen-Shot-2021-12-06-at-4-53-28-PM.png\" style=\"float:left\" width=\"800px\"\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003c/p\u003e\n\nThis is an **ONGOING** research effort. So, don't expect everything to be working. The is an extended implementation of our preprint [Causal Distillation for Language Models](https://zen-wu.social/papers/ACL22_CausalDistill.pdf) by applying the method to task-specific models (i.e., the teacher model here is a fine-tuned model). The codebased for the distillation method **the distillation interchange intervention training objective (DIITO)** can be found [here](https://github.com/frankaging/Causal-Distill).\n\nWe fork our main codebase from the [PKD Distillation](https://github.com/intersun/PKD-for-BERT-Model-Compression) to ensure a fair comparison.\n\n## Release Notes\n:white_check_mark: 02/21/2022 Release this codebase for others who are interested in applying [DIITO](https://github.com/frankaging/Causal-Distill) to task-specific models.\n\nIf you experience any issues or have suggestions, please contact me either thourgh the issues page or at wuzhengx@stanford.edu. \n\n## Main Contents\n* [Citation](#citation)\n* [Requirements](#requirements)\n* [Distillation](#distillation)\n\n## Citation\nIf you use this repository, please cite the following two papers: [paper for interchange intervention training](https://arxiv.org/abs/2112.00826), and [paper for the our distillation method](https://arxiv.org/abs/2109.08994).\n```stex\n  @article{geiger-etal-2021-iit,\n        title={Inducing Causal Structure for Interpretable Neural Networks}, \n        author={Geiger, Atticus and Wu, Zhengxuan and Lu, Hanson and Rozner, Josh and Kreiss, Elisa and Icard, Thomas and Goodman, Noah D. and Potts, Christopher},\n        year={2021},\n        eprint={2112.00826},\n        archivePrefix={arXiv},\n        primaryClass={cs.LG}\n  }\n\n  @article{wu-etal-2021-distill,\n        title={Causal Distillation for Language Models}, \n        author={Wu, Zhengxuan and Geiger, Atticus and Rozner, Josh and Kreiss, Elisa and Lu, Hanson and Icard, Thomas and Potts, Christopher and Goodman, Noah D.},\n        year={2021},\n        eprint={2112.02505},\n        archivePrefix={arXiv},\n        primaryClass={cs.CL}\n  }\n```\n\n## Requirements\n- Python 3.6 or 3.7 are supported.\n- Pytorch Version: 1.9.0\n- Transfermers Version: 4.11.3\n- Datasets Version: Version: 1.8.0\n- Since we build our codebase off the [Huggingface Distillation Interface](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), please review their doc for requirements.\n\n## Distillation\nNow, here is an example for you to distill with our causal distillation objective or without,\n```bash\npython KD_training.py \\\n--task_name SST-2 \\\n--output_dir data/outputs/KD/SST-2/teacher_12layer/ \\\n--bert_model bert-base-uncased \\\n--max_seq_length 128 \\\n--train_batch_size 32 \\\n--learning_rate 2e-5 \\\n--num_train_epochs 5 \\\n--eval_batch_size 32 \\\n--gradient_accumulation_steps 1 \\\n--log_interval 10 \\\n--checkpoint_interval 100 \\\n--do_train \\\n--fp16 False \\\n--student_hidden_layers 6 \\\n--fc_layer_idx 1,3,5,7,9 \\\n--kd_model kd \\\n--alpha 0.7 \\\n--T 20 \\\n--is_wandb \\\n--wandb_metadata wuzhengx:DIITO-XXS \\\n--neuron_mapping full \\\n--is_diito \\\n--interchange_prop 0.3\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrankaging%2Fcausal-distill-xxs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrankaging%2Fcausal-distill-xxs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrankaging%2Fcausal-distill-xxs/lists"}