{"id":17871323,"url":"https://github.com/da03/implicit_chain_of_thought","last_synced_at":"2025-04-05T06:04:11.546Z","repository":{"id":203833693,"uuid":"695389764","full_name":"da03/implicit_chain_of_thought","owner":"da03","description":null,"archived":false,"fork":false,"pushed_at":"2024-11-11T14:43:19.000Z","size":74542,"stargazers_count":123,"open_issues_count":8,"forks_count":25,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-29T05:06:27.785Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/da03.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-23T03:20:11.000Z","updated_at":"2025-03-27T03:05:46.000Z","dependencies_parsed_at":"2023-11-12T21:26:10.117Z","dependency_job_id":"57b0d245-3aa8-4ffd-9d61-bda8a51ffed8","html_url":"https://github.com/da03/implicit_chain_of_thought","commit_stats":null,"previous_names":["da03/implicit_chain_of_thought"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fimplicit_chain_of_thought","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fimplicit_chain_of_thought/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fimplicit_chain_of_thought/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fimplicit_chain_of_thought/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/da03","download_url":"https://codeload.github.com/da03/implicit_chain_of_thought/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247294516,"owners_count":20915340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-28T10:30:21.520Z","updated_at":"2025-04-05T06:04:11.527Z","avatar_url":"https://github.com/da03.png","language":"Python","funding_links":[],"categories":["Python","3. Technical Paradigms for Implicit Reasoning"],"sub_categories":["3.1 Latent Optimization"],"readme":"# Implicit Chain of Thought Reasoning via Knowledge Distillation\n\nHere we provide code to reproduce our results.\n\n## Prerequisites\n\n* [PyTorch](https://pytorch.org/get-started/locally/)\n* [transformers](https://github.com/huggingface/transformers) (`pip install transformers`)\n\n## Datasets \u0026 Pretrained Models \u0026 Logs\n\nAll dataset files and log files during inference are included in this repo, with the exception of large training files maintained under Git LFS. Model checkpoints are stored on Google Drive. The folder containing all checkpoints can be found at [this link](https://drive.google.com/drive/folders/1Sclr5bmLZIUcktCaFAeWRTevRGLUwlC_?usp=drive_link).\n\n* 4 X 4 Mult - GPT-2: [data](data/4_by_4_mult/) [model](https://drive.google.com/drive/folders/1Zp-PFwiHkwq0wuFScjN5R8jDdXdnQYQ_?usp=sharing) [log](logs/4_by_4_mult/gpt2/log.generate)\n* 4 X 4 Mult - GPT-2 Medium: [data](data/4_by_4_mult/) [model](https://drive.google.com/drive/folders/1B0e67ifTSTTuUg0Sh-of5135Rh4KQ-2v?usp=sharing) [log](logs/4_by_4_mult/gpt2-medium/log.generate)\n* 5 X 5 Mult - GPT-2: [data](data/5_by_5_mult/) [model](https://drive.google.com/drive/folders/1lHa2Xey8jJ3__RsYRhcOFHU7Xfqp7XTG?usp=sharing) [log](logs/5_by_5_mult/gpt2/log.generate)\n* 5 X 5 Mult - GPT-2 Medium: [data](data/5_by_5_mult/) [model](https://drive.google.com/drive/folders/18dRIynq0j5EBOnKTpOPaLJWCoMBXZYTi?usp=sharing) [log](logs/5_by_5_mult/gpt2-medium/log.generate)\n* GSM8K - GPT-2: [data](data/5_by_5_mult/) [model](https://drive.google.com/drive/folders/1aFBBcUr_vHtaDqgpU5A1ErEvrJyX-cEO?usp=sharing) [log](logs/gsm8k/gpt2/log.generate)\n* GSM8K - GPT-2 Medium: [data](data/5_by_5_mult/) [model](https://drive.google.com/drive/folders/1zFXfwq5jDjgKpbUVafY5KC0LmJpYXjQK?usp=sharing) [log](logs/gsm8k/gpt2-medium/log.generate)\n\n## Usage\n\nWe use 4 X 4 Mult with GPT2-Small as an example. We assume that the working directory is `implicit_chain_of_thought` throughout this document.\n\n### Data Format\n\nThe format of training, validation, and test files looks like below:\n\n```\n[input 1]||[chain-of-thought 1] #### [output 1]\n[input 2]||[chain-of-thought 2] #### [output 3]\n[input 3]||[chain-of-thought 2] #### [output 3]\n...\n```\n\nAs an example, let's take a look at the first line from the 4 X 4 Mult test set in [data/4_by_4_mult/test_bigbench.txt](data/4_by_4_mult/test_bigbench.txt):\n\n```\n9 1 7 3 * 9 4 3 3||1 7 4 3 3 + 0 6 7 8 4 1 ( 1 3 2 2 8 1 ) + 0 0 7 5 1 1 1 ( 1 3 9 7 9 2 1 ) + 0 0 0 7 5 1 1 1 #### 1 3 9 4 5 4 2 1\n```\n\nIn this example, the input is `9 1 7 3 * 9 4 3 3` (corresponding to `3719*3349`), the chain-of-thought is `1 7 4 3 3 + 0 6 7 8 4 1 ( 1 3 2 2 8 1 ) + 0 0 7 5 1 1 1 ( 1 3 9 7 9 2 1 ) + 0 0 0 7 5 1 1 1`, and the output is `1 3 9 4 5 4 2 1` (corresponding to `12454931`).\n\nNote that for Teacher Training, (a) Mind-Reading the Teacher, and (b) Thought Emulation, the chain-of-thought steps are used; but for (c) Couple and Optimize the chain-of-thought steps are not used.\n\n### Training\n\n#### Prerequisite: Teacher Training\n\nOur approach is based on distilling a teacher models horizontal reasoning process into the vertical reasoning process of the emulator and the student. Therefore, we need to first train a teacher on the task of explicit chain-of-thought reasoning.\n\n```\nexport FOLDER=data/4_by_4_mult\nexport MODEL=gpt2\nexport EPOCHS=1\nexport LR=5e-5\nexport BSZ=32\nexport SAVE=train_models/4_by_4_mult/gpt2/teacher\necho $SAVE\nmkdir -p $SAVE\nTOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=0 stdbuf -oL -eL python src/train_teacher.py \\\n    --train_path ${FOLDER}/train.txt \\\n    --val_path ${FOLDER}/valid.txt \\\n    --epochs $EPOCHS \\\n    --lr $LR \\\n    --batch_size $BSZ \\\n    --base_model $MODEL \\\n    --save_model $SAVE \\\n    \u003e ${SAVE}/log.train 2\u003e\u00261\u0026\n```\n\n#### (a) Mind-Reading the Teacher\n\n![](imgs/training_illustration_a.png)\n\n```\nexport FOLDER=data/4_by_4_mult\nexport DELTA=dynamic\nexport MODEL=gpt2\nexport EPOCHS=40\nexport LR=5e-5\nexport BSZ=32\nexport TEACHER=train_models/4_by_4_mult/gpt2/teacher/checkpoint_0\nexport SAVE=train_models/4_by_4_mult/gpt2/student_initial\nmkdir -p $SAVE\nTOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=0 stdbuf -oL -eL python src/train_mind_reading_student.py \\\n    --train_path ${FOLDER}/train.txt \\\n    --val_path ${FOLDER}/valid.txt \\\n    --epochs $EPOCHS \\\n    --lr $LR \\\n    --batch_size $BSZ \\\n    --base_model $MODEL \\\n    --teacher $TEACHER \\\n    --save_model $SAVE \\\n    --delta $DELTA \\\n    \u003e ${SAVE}/log.train 2\u003e\u00261\u0026\n```\n\n#### (b) Thought Emulation\n\n![](imgs/training_illustration_b.png)\n\n```\nexport FOLDER=data/4_by_4_mult\nexport DELTA=dynamic\nexport MODEL=gpt2\nexport EPOCHS=40\nexport LR=5e-5\nexport BSZ=32\nexport MIXTURE_SIZE=1\nexport TEACHER=train_models/4_by_4_mult/gpt2/teacher/checkpoint_0\nexport SAVE=train_models/4_by_4_mult/gpt2/emulator_initial\nmkdir -p $SAVE\nTOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=0 stdbuf -oL -eL python src/train_thought_emulator.py \\\n    --train_path ${FOLDER}/train.txt \\\n    --val_path ${FOLDER}/valid.txt \\\n    --epochs $EPOCHS \\\n    --lr $LR \\\n    --batch_size $BSZ \\\n    --base_model $MODEL \\\n    --teacher $TEACHER \\\n    --save_model $SAVE \\\n    --delta $DELTA \\\n    --mixture_size ${MIXTURE_SIZE} \\\n    \u003e ${SAVE}/log.train 2\u003e\u00261\u0026\n```\n\n#### (c) Couple and Optimize\n\n![](imgs/training_illustration_c.png)\n\n```\nexport FOLDER=data/4_by_4_mult\nexport EPOCHS=40\nexport LR=5e-5\nexport BSZ=32\nexport STUDENT=train_models/4_by_4_mult/gpt2/student_initial/checkpoint_6\nexport EMULATOR=train_models/4_by_4_mult/gpt2/emulator_initial/checkpoint_5\nexport SAVE=train_models/4_by_4_mult/gpt2/\nmkdir -p $SAVE\nTOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=0 stdbuf -oL -eL python src/train_coupled_emulator_and_student.py \\\n    --train_path ${FOLDER}/train.txt \\\n    --val_path ${FOLDER}/valid.txt \\\n    --epochs $EPOCHS \\\n    --lr $LR \\\n    --batch_size $BSZ \\\n    --student $STUDENT \\\n    --emulator $EMULATOR \\\n    --save_model $SAVE \\\n    \u003e ${SAVE}/log.train 2\u003e\u00261\u0026\n```\n\n### Generation \u0026 Evaluation\n\nHere we use a pretrained model as an example. Download the folder `models/4_by_4_mult/gpt2`, then the following command will run inference and evaluate both accuracy and throughput, logged in file `generation_logs/4_by_4_mult/log.generate`.\n\n```\nexport FOLDER=data/4_by_4_mult\nexport STUDENT=models/4_by_4_mult/gpt2/student\nexport EMULATOR=models/4_by_4_mult/gpt2/emulator\nexport BSZ=1\nexport SAVE=generation_logs/4_by_4_mult\nmkdir -p $SAVE\nTOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=0 stdbuf -oL -eL python src/generate.py \\\n    --batch_size $BSZ \\\n    --test_path ${FOLDER}/test_bigbench.txt \\\n    --student_path $STUDENT \\\n    --emulator_path $EMULATOR \\\n    \u003e ${SAVE}/log.generate 2\u003e\u00261\u0026\n```\n\n## Citation\n\n```\n@misc{deng2023implicit,\n      title={Implicit Chain of Thought Reasoning via Knowledge Distillation}, \n      author={Yuntian Deng and Kiran Prasad and Roland Fernandez and Paul Smolensky and Vishrav Chaudhary and Stuart Shieber},\n      year={2023},\n      eprint={2311.01460},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fda03%2Fimplicit_chain_of_thought","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fda03%2Fimplicit_chain_of_thought","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fda03%2Fimplicit_chain_of_thought/lists"}