{"id":50510398,"url":"https://github.com/idanshen/Self-Distillation","last_synced_at":"2026-06-19T14:00:33.608Z","repository":{"id":335188071,"uuid":"1144658404","full_name":"idanshen/Self-Distillation","owner":"idanshen","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-07T20:14:25.000Z","size":2928,"stargazers_count":481,"open_issues_count":3,"forks_count":52,"subscribers_count":10,"default_branch":"main","last_synced_at":"2026-04-07T22:23:53.637Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idanshen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-28T22:42:43.000Z","updated_at":"2026-04-07T20:14:33.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/idanshen/Self-Distillation","commit_stats":null,"previous_names":["idanshen/self-distillation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/idanshen/Self-Distillation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idanshen%2FSelf-Distillation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idanshen%2FSelf-Distillation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idanshen%2FSelf-Distillation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idanshen%2FSelf-Distillation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idanshen","download_url":"https://codeload.github.com/idanshen/Self-Distillation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idanshen%2FSelf-Distillation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34534278,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-19T02:00:06.005Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-02T20:00:26.252Z","updated_at":"2026-06-19T14:00:33.603Z","avatar_url":"https://github.com/idanshen.png","language":"Python","funding_links":[],"categories":["♻️ Self-Distillation with Privileged Context — OPSD"],"sub_categories":[],"readme":"# Self-Distillation Fine-Tuning\n\nThis is TRL-based code for reproducing the On-Policy Self-Distillation algorithm from the paper \"Self-Distillation Enables Continual Learning\" - [https://arxiv.org/abs/2601.19897](https://arxiv.org/abs/2601.19897).\n\n All experiments can be run with a single H200 GPU. Other setups may require refactoring and/or changing model sizes.\n\n### Updates\n\n04/07/26: after some investigation, we've found that all the results in our paper were produced using on-policy sampling, but per-token forward KL loss (similar to the [GKD paper](https://arxiv.org/abs/2306.13649)). Therefore, this is the default argument in this repo, and we will update the arXiv version soon with clarification.\n\n03/12/26: added the science dataset and evaluation pipeline, regenerated the tool-use dataset, and added an updated tool-use evaluation file. I'll upload the Medical and Wiki datasets soon.\n\n## Abstract\nContinual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce On-Policy **Self-Distillation Fine-Tuning (SDFT)**, a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations.\n\n\n##  Setup\n\n### 1. Clone the repository\n\n```bash\ngit clone https://github.com/Continual-Intelligence/Self-Distillation.git\ncd Self-Distillation\n```\n\n### 2. Set up a virtual environment\n\nUsing **conda**:\n\n```bash\nconda create -n distillation python=3.12\nconda activate distillation\n```\n\nUsing **venv**:\n\n```bash\npython3.12 -m venv distillation\nsource distillation/bin/activate\n```\n\n### 3. Install dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n### 4. Usage\n\n#### Tooluse\n\nTraining:\n\n```bash\npython main.py \\\n  --dataset_name tooluse \\\n  --model_name Qwen/Qwen2.5-7B-Instruct \\\n  --output_dir \u003coutput_path\u003e \\\n  --learning_rate 5e-5 \\\n  --num_train_epochs 2\n```\n\nEvaluation:\n\n```bash\npython eval_tooluse_simple.py \\\n  --model_path \u003cpath_to_trained_model\u003e \\\n  --output_dir \u003coutput_path\u003e\n```\n\n#### Science\n\nTraining:\n\n```bash\npython main.py \\\n  --dataset_name science \\\n  --model_name Qwen/Qwen2.5-7B-Instruct \\\n  --output_dir \u003coutput_path\u003e \\\n  --learning_rate 5e-5 \\\n  --num_train_epochs 2\n```\n\nEvaluation:\n\n```bash\npython eval_science.py \\\n  --model_path \u003cpath_to_trained_model\u003e \\\n  --output_dir \u003coutput_path\u003e\n```\n\n### 5. Forgetting Evaluation\n\nTo produce the forgetting metrics in the paper we use the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) by Eleuther AI.\n\nTo reproduce the results please install the specific commit we have used:\n```bash\npip install git+https://github.com/EleutherAI/lm-evaluation-harness@03c44adc0586f88bb343a74da1a1c602103536dd\n```\n\nand run the following command:\n\n```bash\nlm_eval --model hf --model_args pretrained=\u003cpath_to_your_model\u003e --output_path \u003coutput_dir\u003e --confirm_run_unsafe_code --tasks hellaswag,mmlu,truthfulqa,winogrande,humaneval,ifeval\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidanshen%2FSelf-Distillation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidanshen%2FSelf-Distillation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidanshen%2FSelf-Distillation/lists"}