{"id":19674036,"url":"https://github.com/frankaging/causal-proxy-model","last_synced_at":"2025-07-09T06:08:22.871Z","repository":{"id":70142656,"uuid":"479931815","full_name":"frankaging/Causal-Proxy-Model","owner":"frankaging","description":"The Codebase for Causal Proxy Model","archived":false,"fork":false,"pushed_at":"2022-09-29T16:58:49.000Z","size":40006,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-29T01:42:28.465Z","etag":null,"topics":["causal-inference","concept-learning","explainable-ai","interpretability","model-explanation"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2209.14279","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/frankaging.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-04-10T06:22:41.000Z","updated_at":"2024-11-06T14:42:46.000Z","dependencies_parsed_at":"2023-03-04T19:15:18.156Z","dependency_job_id":null,"html_url":"https://github.com/frankaging/Causal-Proxy-Model","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/frankaging/Causal-Proxy-Model","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Proxy-Model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Proxy-Model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Proxy-Model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Proxy-Model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/frankaging","download_url":"https://codeload.github.com/frankaging/Causal-Proxy-Model/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankaging%2FCausal-Proxy-Model/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264403797,"owners_count":23602621,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causal-inference","concept-learning","explainable-ai","interpretability","model-explanation"],"created_at":"2024-11-11T17:17:02.656Z","updated_at":"2025-07-09T06:08:22.865Z","avatar_url":"https://github.com/frankaging.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Python 3.7](https://img.shields.io/badge/python-3.7-blueviolet.svg?style=plastic)\n![License CC BY-NC](https://img.shields.io/badge/license-MIT-05b502.svg?style=plastic)\n\n\u003ch1 align=\"center\"\u003e\n  \u003cb\u003eCausal Proxy Models For Concept-Based Model Explanations\u003c/b\u003e\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cb\u003e\u003ca href=\"https://nlp.stanford.edu/~wuzhengx/\"\u003eZhengxuan Wu\u003c/a\u003e*, \u003ca href=\"https://www.kareldoosterlinck.com/\"\u003eKarel D'Oosterlinck\u003c/a\u003e*, \u003ca href=\"https://atticusg.github.io/\"\u003eAtticus Geiger\u003c/a\u003e*, \u003ca href=\"https://www.linkedin.com/in/amir-zur-a924ba187/\"\u003eAmir Zur\u003c/a\u003e, \u003ca href=\"https://web.stanford.edu/~cgpotts/\"\u003eChristopher Potts\u003c/a\u003e\u003c/b\u003e\u003c/span\u003e\n\u003c/p\u003e\n\nThe codebase contains some implementations of our preprint [Causal Proxy Models For Concept-Based Model Explanations](https://arxiv.org/abs/2209.14279). In this paper, we introuce two variants of CPM, \n* CPM\u003csub\u003eIN\u003c/sub\u003e: Input-base CPM uses auxiliary token to represent the intervention, and is trained in a supervised way of predicting counterfactual output. This model is built on an input-level intervention.\n* CPM\u003csub\u003eHI\u003c/sub\u003e: Hidden-state CPM uses Interchange Intervention Training (IIT) to localize concept information within its representations, and swaps hidden-states to represent the intervention. It is trained in a supervised way of predicting counterfactual output. This model is built on a hidden-state intervention.\n\nThis codebase contains implementations and experiments for both **CPM\u003csub\u003eIN\u003c/sub\u003e** and **CPM\u003csub\u003eHI\u003c/sub\u003e**. If you experience any issues or have suggestions, please contact me either thourgh the issues page or at wuzhengx@cs.stanford.edu or at karel.doosterlinck@ugent.be. \n\n## Citation\nIf you use this repository, please consider to cite our relevant papers:\n```stex\n  @article{wu-etal-2021-cpm,\n        title={Causal Proxy Models For Concept-Based Model Explanations}, \n        author={Wu, Zhengxuan and D'Oosterlinck, Karel and Geiger, Atticus and Zur, Amir and Potts, Christopher},\n        year={2022},\n        eprint={2209.14279},\n        archivePrefix={arXiv},\n        primaryClass={cs.LG}\n  }\n\n  @article{geiger-etal-2021-iit,\n        title={Inducing Causal Structure for Interpretable Neural Networks}, \n        author={Geiger, Atticus and Wu, Zhengxuan and Lu, Hanson and Rozner, Josh and Kreiss, Elisa and Icard, Thomas and Goodman, Noah D. and Potts, Christopher},\n        year={2021},\n        eprint={2112.00826},\n        archivePrefix={arXiv},\n        primaryClass={cs.LG}\n  }\n```\n\n## Requirements\n- Python 3.6 or 3.7 are supported.\n- Pytorch Version: 1.11.0\n- Transfermers Version: 4.21.1\n- Datasets Version: Version: 2.3.2\n\n\n## Installation\nFirst clone the directory. Then run the following command to initialize the submodules:\n\n```bash\ngit submodule init; git submodule update\n```\n\n## Loading Black-box Models for CEBaB\nThese models are avaliable from the [CEBaB website](https://cebabing.github.io/CEBaB/). Here is one example about how to load these models!\n\n```python\nfrom transformers import AutoTokenizer, BertForNonlinearSequenceClassification\n\ntokenizer = AutoTokenizer.from_pretrained(\"CEBaB/bert-base-uncased.CEBaB.sa.5-class.exclusive.seed_42\")\n\nmodel = BertForNonlinearSequenceClassification.from_pretrained(\"CEBaB/bert-base-uncased.CEBaB.sa.5-class.exclusive.seed_42\")\n```\n\n## Loading **CPMs** for CEBaB\nWe aim to make all of our **CPMs** public. Currently, they are be found on [our huggingface repo](https://huggingface.co/CPMs).\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\ntokenizer = AutoTokenizer.from_pretrained(\"CPMs/cpm.hi.bert-base-uncased.layer.10.size.192\")\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\"CPMs/cpm.hi.bert-base-uncased.layer.10.size.192\")\n```\n\nNote that we also have different helpers to load these models into our explainer module. Please refer to notebooks under `experiments` folder.\n\n## Training **CPM\u003csub\u003eIN\u003c/sub\u003e**\n\nTo train **CPM\u003csub\u003eIN\u003c/sub\u003e**, we follow the basic finetuning setup since the intervention is on the inputs. To train, you should first go to `CEBaB-inclusive/eval_pipeline/`; and you can run the following command to train.\n\n```bash\npython main.py \\\n--model_architecture bert-base-uncased \\\n--train_setting inclusive \\\n--model_output_dir model_output \\\n--output_dir output \\\n--flush_cache true \\\n--task_name opentable_5_way \\\n--batch_size 128 \\\n--k_array 19684\n```\n\nTo train with different variants of *approximate counterfactuals*, you need to change the flag `--train_setting approximate` for metadata-sampled counterfactuals. Note that in this setting, you can ignore the field `--k_array`. You should change `--model_architecture` for different model architectures.\n\n## Training **CPM\u003csub\u003eHI\u003c/sub\u003e**\n\nTo train **CPM\u003csub\u003eHI\u003c/sub\u003e**, we adapt interchange intervention training (IIT). To train, you can use the following command, and you can refer to our paper for configurations.\n\n```bash\npython Proxy_training.py \\\n--model_name_or_path ./saved_models/bert-base-uncased.opentable.CEBaB.sa.5-class.exclusive.seed_42/ \\\n--task_name CEBaB \\\n--dataset_name CEBaB/CEBaB \\\n--do_train \\\n--per_device_train_batch_size 256 \\\n--per_device_eval_batch_size 256 \\\n--learning_rate 8e-05 \\\n--output_dir ./proxy_training_results/your_first_try/ \\\n--cache_dir ./train_cache/ \\\n--seed 42 \\\n--report_to none \\\n--logging_steps 1 \\\n--alpha 1.0 \\\n--beta 1.0 \\\n--gemma 3.0 \\\n--overwrite_output_dir \\\n--intervention_h_dim 192 \\\n--counterfactual_type true \\\n--k 19684 \\\n--interchange_hidden_layer 10 \\\n--save_steps 10 \\\n--early_stopping_patience 20\n```\nTo train with different variants of *approximate counterfactuals*, you need to change the flag `--counterfactual_type approximate` for metadata-sampled counterfactuals. Note that in this setting, you can ignore the field `--k`. You should change `--model_name_or_path` for different model architectures. These models can be downloaded from [CEBaB website](https://cebabing.github.io/CEBaB/).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrankaging%2Fcausal-proxy-model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrankaging%2Fcausal-proxy-model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrankaging%2Fcausal-proxy-model/lists"}