{"id":50510400,"url":"https://github.com/HJSang/CRISP_Reasoning_Compression","last_synced_at":"2026-06-19T14:00:31.560Z","repository":{"id":342757298,"uuid":"1175049141","full_name":"HJSang/CRISP_Reasoning_Compression","owner":"HJSang","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-26T17:20:32.000Z","size":4046,"stargazers_count":54,"open_issues_count":4,"forks_count":7,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-26T19:12:43.267Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HJSang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-07T06:41:10.000Z","updated_at":"2026-05-26T17:20:38.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/HJSang/CRISP_Reasoning_Compression","commit_stats":null,"previous_names":["hjsang/opsd_reasoning_compression"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HJSang/CRISP_Reasoning_Compression","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HJSang%2FCRISP_Reasoning_Compression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HJSang%2FCRISP_Reasoning_Compression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HJSang%2FCRISP_Reasoning_Compression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HJSang%2FCRISP_Reasoning_Compression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HJSang","download_url":"https://codeload.github.com/HJSang/CRISP_Reasoning_Compression/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HJSang%2FCRISP_Reasoning_Compression/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34534278,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-19T02:00:06.005Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-02T20:00:26.252Z","updated_at":"2026-06-19T14:00:31.545Z","avatar_url":"https://github.com/HJSang.png","language":"Python","funding_links":[],"categories":["♻️ Self-Distillation with Privileged Context — OPSD"],"sub_categories":[],"readme":"# CRISP: Compressed Reasoning via Iterative Self-Policy Distillation (Original OPSDC On-Policy Self-Distillation for Reasoning Compression)\n\nThis repository contains the code for **CRISP** (**C**ompressed **R**easoning via **I**terative **S**elf-**P**olicy Distillation), a method that teaches reasoning models to think more concisely by distilling their own concise behavior back into themselves.\n\n**Paper:** [CRISP: Compressed Reasoning via Iterative Self-Policy Distillation](crisp_compressed_reasoning_via_iterative_self_policy_distillation.pdf) | [arXiv](https://arxiv.org/abs/2603.05433)\n\n**Authors:** Hejian Sang\\*, Yuanda Xu\\*, Zhengze Zhou\\*, Ran He\\*, Zhipeng Wang, Jiachen Sun\n\n**Related write-up:** [Scorer Choice in Math Reasoning Evaluation](https://zhengzezhou.github.io/math-scorer-choice/) — a four-policy decomposition of how verifier choice (answer-extraction vs. symbolic equivalence) can swing reported MATH-500 accuracy by up to ~80 percentage points on identical generations.\n\n## Key Idea\n\nReasoning models think out loud, but much of what they say is noise. CRISP uses a single, almost trivial idea: *ask the model to be concise, then teach it to do so without being asked*.\n\n- **Teacher**: The same model conditioned on a conciseness instruction (e.g., \"Solve concisely, avoid unnecessary steps\")\n- **Student**: The same model without the conciseness instruction\n\nTraining generates student rollouts and minimizes per-token reverse KL divergence between student and teacher distributions. No ground-truth answers, no token budgets, no difficulty estimators.\n\n## Results\n\n| Model | Benchmark | Token Reduction | Accuracy Change |\n|-------|-----------|----------------|-----------------|\n| Qwen3-8B | MATH-500 | 59% | +9 pts (77% → 86%) |\n| Qwen3-14B | MATH-500 | 57% | +16 pts (70% → 86%) |\n| Qwen3-14B | AIME 2024 | 41% | +10 pts |\n\nCompression naturally adapts to problem difficulty (~1.6x more compression on easy vs. hard problems), entropy remains stable throughout training, and general capabilities (MMLU) are fully preserved.\n\n## Repository Structure\n\n```\nOnPolicySD-open/\n├── verl/                          # VERL framework (forked, with minor fixes)\n├── workspace/\n│   ├── config/\n│   │   └── prompts.json           # Prompt templates (student, teacher, length prune)\n│   ├── data/\n│   │   ├── DAPO-Math-17k-dedup/   # Training data (17k math problems)\n│   │   ├── MATH-500/              # Validation benchmark\n│   │   ├── aime24/                # AIME 2024 validation\n│   │   └── aime25/                # AIME 2025 validation\n│   ├── src/\n│   │   ├── data/\n│   │   │   ├── process_eval_data.py          # Process eval datasets (train/val splits)\n│   │   │   ├── prepare_length_prune_data.py  # Generate length pruning prompts\n│   │   │   └── prepare_self_distill_data.py  # Generate self-distill prompts (with teacher solutions)\n│   │   └── self_distill_hybrid/\n│   │       ├── main_opsd.py       # OPSD entry point\n│   │       ├── opsd_trainer.py    # OPSD trainer (JSD/reverse-KL loss)\n│   │       ├── opsd_worker.py     # OPSD FSDP worker\n│   │       ├── sd_worker.py       # Base self-distill worker\n│   │       ├── sd_dataset.py      # Dataset for paired teacher/student prompts\n│   │       └── sd_verifier.py     # Math answer verification\n│   ├── scripts/sft/\n│   │   └── train_opsd.sh          # Main training launch script\n│   └── execution-configs/         # Hyperparameter configs for Qwen3-8B and 14B\n```\n\n## Setup\n\n### Prerequisites\n\n- 8x H100/H200 GPUs (80GB)\n- Python 3.10+\n- CUDA 12.4+\n\n### Installation\n\n```bash\ngit clone https://github.com/HJSang/OPSD_Reasoning_Compression.git\ncd OPSD_Reasoning_Compression\n\n# Install VERL and dependencies\ncd verl\npip install -e .\ncd ..\n\n# Install additional dependencies\npip install sglang pandas datasets hydra-core omegaconf\n```\n\n## Quick Start\n\nThe full pipeline has 3 stages:\n\n### Stage 1: Process Evaluation Data\n\nProcess DAPO-Math-17k-dedup into train/val splits and prepare validation benchmarks (MATH-500, AIME 2024, AIME 2025).\n\n```bash\ncd workspace/src/data\n\npython process_eval_data.py \\\n    --data_dir ../../data \\\n    --output_dir ../../data/processed\n```\n\nThis produces:\n- `data/processed/train.parquet` — DAPO training split (95%)\n- `data/processed/val_dapo.parquet` — DAPO validation split (5%)\n- `data/processed/val_math500.parquet`, `val_aime24.parquet`, `val_aime25.parquet` — Evaluation benchmarks\n\n### Stage 2: Generate Length Pruning Prompts\n\nCreate paired teacher/student prompts for OPSD training. The teacher prompt adds a conciseness instruction; the student prompt is the original DAPO-Math prompt unchanged.\n\n```bash\n# Batch mode (recommended) — generates all 4 variants with shared 80/20 split:\npython prepare_length_prune_data.py batch \\\n    --input-parquet ../../data/DAPO-Math-17k-dedup/distinct-prompts-with-rewards.parquet \\\n    --output-root ../../data\n\n# This creates:\n#   data/length_prune_concise/     — \"Solve concisely\" teacher prompt\n#   data/length_prune_20pct/       — \"Use 20% fewer tokens\" teacher prompt\n#   data/length_prune_50pct/       — \"Use 50% fewer tokens\" teacher prompt\n#   data/length_prune_80pct/       — \"Use 80% fewer tokens\" teacher prompt\n#\n# Each directory contains:\n#   self_distill_prompts.parquet       — Training prompts\n#   self_distill_prompts_val.parquet   — Validation prompts\n```\n\n### Stage 3: Train OPSD\n\nLaunch OPSD training using the VERL HybridEngine (sglang for generation + FSDP for training).\n\n#### Qwen3-8B\n\n```bash\nMODEL_PATH=/path/to/Qwen3-8B \\\nSD_PROMPTS_PATH=./workspace/data/length_prune_concise/self_distill_prompts.parquet \\\nSD_VAL_PROMPTS_PATH=./workspace/data/length_prune_concise/self_distill_prompts_val.parquet \\\nOPSD_BETA=0.5 \\\nSD_TEMPERATURE=1.0 \\\nSD_TOP_P=1.0 \\\nSD_MAX_TOKENS=8192 \\\nSFT_MAX_LENGTH=10240 \\\nTOTAL_EPOCHS=1 \\\nTRAIN_BATCH_SIZE=32 \\\nMICRO_BATCH_SIZE=2 \\\nLEARNING_RATE=1e-6 \\\nTP_SIZE=2 \\\nGPU_MEM_UTIL=0.75 \\\nULYSSES_SP_SIZE=4 \\\nMAX_PROMPT_LENGTH=1024 \\\nMAX_RESPONSE_LENGTH=30000 \\\nVAL_MAX_TOKENS=30000 \\\nCHECK_STRUCTURE=false \\\nUSE_LIGER=true \\\nOPSD_LOSS_TYPE=reverse_kl \\\nTEACHER_UPDATE_FREQ=50 \\\nEXPERIMENT_NAME=opsd_length_prune_concise \\\nbash workspace/scripts/sft/train_opsd.sh\n```\n\n#### Qwen3-14B\n\n```bash\nMODEL_PATH=/path/to/Qwen3-14B \\\nSD_PROMPTS_PATH=./workspace/data/length_prune_concise/self_distill_prompts.parquet \\\nSD_VAL_PROMPTS_PATH=./workspace/data/length_prune_concise/self_distill_prompts_val.parquet \\\nOPSD_BETA=0.5 \\\nSD_TEMPERATURE=1.0 \\\nSD_TOP_P=1.0 \\\nSD_MAX_TOKENS=8192 \\\nSFT_MAX_LENGTH=10240 \\\nTOTAL_EPOCHS=1 \\\nTRAIN_BATCH_SIZE=32 \\\nMICRO_BATCH_SIZE=2 \\\nLEARNING_RATE=1e-6 \\\nTP_SIZE=2 \\\nGPU_MEM_UTIL=0.75 \\\nULYSSES_SP_SIZE=4 \\\nMAX_PROMPT_LENGTH=1024 \\\nMAX_RESPONSE_LENGTH=30000 \\\nVAL_MAX_TOKENS=30000 \\\nCHECK_STRUCTURE=false \\\nUSE_LIGER=true \\\nOPSD_LOSS_TYPE=reverse_kl \\\nTEACHER_UPDATE_FREQ=50 \\\nEXPERIMENT_NAME=opsd_length_prune_concise \\\nbash workspace/scripts/sft/train_opsd.sh\n```\n\nPre-configured hyperparameter files for various ablations (teacher update frequency, compression strength) are available in `workspace/execution-configs/`.\n\n## Key Hyperparameters\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `OPSD_LOSS_TYPE` | `reverse_kl` | Loss type: `reverse_kl` or `jsd` |\n| `OPSD_BETA` | `0.5` | JSD interpolation weight (only used when `jsd`) |\n| `TEACHER_UPDATE_FREQ` | `50` | Steps between teacher weight updates (0 = frozen teacher) |\n| `SD_TEMPERATURE` | `1.0` | Student rollout temperature |\n| `SD_MAX_TOKENS` | `8192` | Max tokens for student generation |\n| `SFT_MAX_LENGTH` | `10240` | Max sequence length for training |\n| `CHECK_STRUCTURE` | `false` | Whether to require `\u003cthink\u003e` tags in responses |\n| `USE_LIGER` | `true` | Memory-efficient loss via logsumexp |\n\n## How It Works\n\n1. **Generate**: sglang produces student responses from question-only prompts\n2. **Score**: Teacher forward pass computes logits on student-generated tokens using the conciseness-augmented prompt\n3. **Train**: Minimize per-token reverse KL between student and teacher distributions on ALL responses (no correctness filtering)\n4. **Sync**: Updated weights are automatically synced back to sglang for the next generation step\n5. **Refresh teacher**: Every `TEACHER_UPDATE_FREQ` steps, copy student weights to teacher for progressive compression\n\n## Acknowledgments\n\nBuilt on top of [VERL](https://github.com/volcengine/verl) (HybridEngine for combined generation and training).\n\n## Citation\n\n```bibtex\n@article{sang2025crisp,\n  title={CRISP: Compressed Reasoning via Iterative Self-Policy Distillation},\n  author={Sang, Hejian and Xu, Yuanda and Zhou, Zhengze and He, Ran and Wang, Zhipeng and Sun, Jiachen},\n  journal={arXiv preprint arXiv:2603.05433},\n  year={2026}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHJSang%2FCRISP_Reasoning_Compression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHJSang%2FCRISP_Reasoning_Compression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHJSang%2FCRISP_Reasoning_Compression/lists"}