{"id":31660372,"url":"https://github.com/internlm/spark","last_synced_at":"2025-10-07T17:06:29.915Z","repository":{"id":317130545,"uuid":"1062454097","full_name":"InternLM/Spark","owner":"InternLM","description":"An official implementation of \"SPARK: Synergistic Policy And Reward Co-Evolving Framework\"","archived":false,"fork":false,"pushed_at":"2025-09-29T03:57:06.000Z","size":5780,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-29T04:14:35.569Z","etag":null,"topics":["large-language-models","large-vision-language-models","math-reasoning","multi-modal","reward-model","self-improvement","self-rewarding","vision-language-model"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2509.22624","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InternLM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-23T09:23:26.000Z","updated_at":"2025-09-29T03:57:10.000Z","dependencies_parsed_at":"2025-09-29T04:24:46.508Z","dependency_job_id":null,"html_url":"https://github.com/InternLM/Spark","commit_stats":null,"previous_names":["internlm/spark"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/InternLM/Spark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2FSpark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2FSpark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2FSpark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2FSpark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InternLM","download_url":"https://codeload.github.com/InternLM/Spark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2FSpark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278811851,"owners_count":26050183,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["large-language-models","large-vision-language-models","math-reasoning","multi-modal","reward-model","self-improvement","self-rewarding","vision-language-model"],"created_at":"2025-10-07T17:06:25.886Z","updated_at":"2025-10-07T17:06:29.903Z","avatar_url":"https://github.com/InternLM.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003e\u003cimg src=\"assets/logo.png\" width=\"256\"\u003e\u003c/h1\u003e\n  \u003ch1 align=\"center\"\u003eSpark: Synergistic Policy And Reward Co-Evolving Framework\u003c/h1\u003e\n    \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/Liuziyu77\"\u003e\u003cstrong\u003eZiyu Liu\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://yuhangzang.github.io/\"\u003e\u003cstrong\u003eYuhang Zang\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://scholar.google.com/citations?user=iDPJVBsAAAAJ\u0026hl=zh-CN\"\u003e\u003cstrong\u003eShengyuan Ding\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://scholar.google.com/citations?user=sJkqsqkAAAAJ\"\u003e\u003cstrong\u003eYuhang Cao\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://lightdxy.github.io/\"\u003e\u003cstrong\u003eXiaoyi Dong\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://kennymckormick.github.io/\"\u003e\u003cstrong\u003eHaodong Duan\u003c/strong\u003e\u003c/a\u003e\n    ·\n     \u003ca href=\"http://dahua.site/\"\u003e\u003cstrong\u003eDahua Lin\u003c/strong\u003e\u003c/a\u003e\n    ·\n     \u003ca href=\"https://myownskyw7.github.io/\"\u003e\u003cstrong\u003eJiaqi Wang\u003c/strong\u003e\u003c/a\u003e\n  \u003c/p\u003e\n  \u003c!-- \u003ch2 align=\"center\"\u003eAccepted By ICCV 2025!\u003c/h2\u003e --\u003e\n\u003c!-- 🏠\u003ca href=\"https://liuziyu77.github.io/MIA-DPO/\"\u003eHomepage\u003c/a\u003e\u003c/h3\u003e| --\u003e\n  📖\u003ca href=\"https://arxiv.org/abs/2509.22624\"\u003ePaper\u003c/a\u003e |\n  🤗\u003ca href=\"https://huggingface.co/internlm/Spark-VL-7B\"\u003eModels\u003c/a\u003e | 🤗\u003ca href=\"https://huggingface.co/datasets/internlm/Spark-Data\"\u003eDatasets\u003c/a\u003e\u003c/h3\u003e | 🤗\u003ca href=\"https://huggingface.co/papers/2509.22624\"\u003eDaily Paper\u003c/a\u003e\u003c/h3\u003e\n\u003cdiv align=\"center\"\u003e\u003c/div\u003e\n\u003cp align=\"center\"\u003e\n  \u003cp\u003e\n🌈\u003cstrong\u003eIntroduction: \u003c/strong\u003e\nWe propose SPARK, \u003cstrong\u003ea unified framework that integrates policy and reward into a single model for joint and synchronous training\u003c/strong\u003e. SPARK can automatically derive reward and reflection data from verifiable reward, enabling \u003cstrong\u003eself-learning and self-evolution\u003c/strong\u003e. Furthermore, we instantiate this framework on multiple backbones, training \u003cstrong\u003eSPARK-VL-7B\u003c/strong\u003e, \u003cstrong\u003eSPARK-7B\u003c/strong\u003e, and \u003cstrong\u003eSPARK-VL-32B\u003c/strong\u003e.\n\n⭐ If you find our code or model helpful, please consider giving us a star — your support means a lot!\n  \u003c/p\u003e\n\u003c!--     \u003ca href=\"\"\u003e\n      \u003cimg src=\"assets/teaser.png\" alt=\"Logo\" width=\"100%\"\u003e \n    \u003c/a\u003e --\u003e\n\u003cbr\u003e\n\n## 📢 News\n- 🚀 [09/29/2025] We release our 🤗\u003ca href=\"https://huggingface.co/datasets/internlm/Spark-Data\"\u003edatasets\u003c/a\u003e.\n- 🚀 [09/29/2025] We release our **Spark's** \u003ca href=\"https://arxiv.org/abs/2509.22624\"\u003ePaper\u003c/a\u003e.\n- 🚀 [09/29/2025] We upload our evaluation code and model checkpoints.\n- 🚀 [09/29/2025] We release **Spark** repository.\n\n## 💡 Highlights\n- 🔥 **Synergistic Policy–Reward Co-Evolving (SPARK)**: We introduce SPARK, a unified reinforcement fine-tuning framework that jointly optimizes policy and reward within a single model through on-policy co-evolution..\n- 🔥 **Recycling Rollouts**: Unlike conventional RL pipelines that discard rollouts after policy updates, SPARK recycles RLVR rollouts into pointwise, pairwise, and reflection objectives, enabling the model itself to act as both a strong policy and a generative reward model.\n- 🔥 **Co-Evolving Mechanism**: Improved reward accuracy provides better gradients for policy learning, while stronger reasoning further refines reward judgment, forming a positive feedback loop that enhances reasoning, judgment, and reflection in synergy.\n- 🔥 **Efficient and Practical**: SPARK requires no human preference data, teacher models, or external reward models, making it significantly more data- and compute-efficient than traditional RM-based RL pipelines.\n\n\n\u003ca href=\"\"\u003e\n  \u003cimg src=\"assets/teaser.png\" alt=\"Logo\" \u003e\n\u003c/a\u003e\n\n\n## ⚙️ Framework\n**SPARK** introduces a unified reinforcement learning framework where policy and reward evolve within a single model.\nTraditional RL pipelines either rely on external reward models (**RLHF**) or discard verifiable rewards (**RLVR**). In contrast, SPARK recycles verifiable rewards to guide on-policy reward and reflection data generation:\n\nThis design turns the model into **both a strong policy and a generative reward model**. Through on-policy co-evolving, SPARK establishes a positive feedback loop: **improved reward accuracy provides stronger policy gradients, while better reasoning further enhances reward judgment**.\n\nAs a result, SPARK not only boosts reasoning and judgment simultaneously but also unlocks self-reflection ability at test time, enabling more stable and generalizable performance across diverse tasks.\n\n\u003ca href=\"\"\u003e\n  \u003cimg src=\"assets/framework.png\" alt=\"Logo\" \u003e\n\u003c/a\u003e\n\n## 🛠️ Setup\n```\ngit clone https://github.com/InternLM/Spark.git\nconda create -n Lmm_xc python=3.10\nconda activate Visual-RFT\ncd /Spark/Lmm_XC\npip install -e .[vllm]\npip install flash_attn --no-build-isolation\n```\nLmm_XC is developed upon modifications to the LMM-R1 project, and its installation process can be referred to the LMM-R1 instructions.\n\n## Datasets\n🔦 Our dataset includes the training data for **Spark-VL-7B** and **Spark-VL-32B** models, as well as a collection of all **multimodal mathematical benchmarks**. It can be directly downloaded and used. Refer to 🤗\u003ca href=\"https://huggingface.co/datasets/internlm/Spark-Data\"\u003edatasets\u003c/a\u003e.\n\n## Inference\nWe have uploaded the model \u003cstrong\u003eSpark-VL-7B\u003c/strong\u003e (\u003ca href=\"https://huggingface.co/internlm/Spark-VL-7B\"\u003e🤗Huggingface\u003c/a\u003e). You can use it to evaluate the inference performance of on Multimodal Mathematical Benchmarks and Reward-Related Benchmarks. \nIt should be noted that during our training process, we append the following prompt at the end of the input to facilitate answer extraction. Therefore, it is recommended to also append this prompt at the end during testing.\n```\n Please first conduct reasoning, and then answer the question. Repeat the final answer using a '\\\\boxed{}'.\n```\n\n#### 🤗 Using Transformers\n\nOur model is based on Qwen2.5-VL-7B-Instruct. You can use the same code as the Qwen2.5-VL-7B-Instruct model for inference, referring to \u003ca href=\"https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct\"\u003e🤗Huggingface\u003c/a\u003e.\n```python\nfrom transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor\nfrom qwen_vl_utils import process_vision_info\n\nmodel = Qwen2_5_VLForConditionalGeneration.from_pretrained(\n    \"internlm/Spark-VL-7B\",\n    torch_dtype=torch.bfloat16,\n    attn_implementation=\"flash_attention_2\",\n    device_map=\"auto\",\n)\n\nprocessor = AutoProcessor.from_pretrained(\"internlm/Spark-VL-7B\")\n\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"image\",\n                \"image\": image_path,\n            },\n            {\"type\": \"text\", \"text\": prompt},\n        ],\n    }\n]\n\n# Preparation for inference\ntext = processor.apply_chat_template(\n    messages, tokenize=False, add_generation_prompt=True\n)\nimage_inputs, video_inputs = process_vision_info(messages)\ninputs = processor(\n    text=[text],\n    images=image_inputs,\n    videos=video_inputs,\n    padding=True,\n    return_tensors=\"pt\",\n)\ninputs = inputs.to(\"cuda\")\n\n# Inference: Generation of the output\ngenerated_ids = model.generate(**inputs, max_new_tokens=128)\ngenerated_ids_trimmed = [\n    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)\n]\noutput_text = processor.batch_decode(\n    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False\n)\nprint(output_text)\n```\n\n#### 🔦 Using vLLM\n\nWe recommend using **vLLM** for faster inference speed. Using vLLM leads to significant speed improvements in dataset evaluation.\n```bash\nPORT=8019\nN_PROC=256\nSERVE_NAME=spark_vl_7b\nMODEL_PATH=/internlm/Spark-VL-7B\n\nCUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve \"$MODEL_PATH\" \\\n  --tensor-parallel-size 4 \\\n  --served-model-name $SERVE_NAME \\\n  --port $PORT \\\n  --max-num-seqs $N_PROC\n```\n\n\n## Training\n\n### Spark Training \nAfter downloading the dataset, you can start training using the following example bash script. Our bash scripts are in ```/Spark/Lmm_XC/XC/scripts/spark_training```\nYou need to modify the dataset paths and model paths to your own locations.\n```\nexport WORKSPACE_DIR=\"/fs-computility/....../Lmm_XC\"                 # Path to project root directory\nexport DATASET_PATH=\"/fs-computility/....../infer_data_ViRL_19k.json\"            # Path to your dataset\nexport PRETRAIN_MODEL_PATH=\"/fs-computility/....../Qwen2.5-VL-7B-Instruct\"  # Path to pretrained model\nexport WANDB_PROJECT=\"Observation\"        # Name for this project\nexport MODEL_CPK_NAME=\"Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2\"         # Name for this training run\nexport LOG_PATH='/fs-computility/....../Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2.txt'      #Log file save path\n\n\nexport WANDB_API_KEY=\"......\"\nexport SAVE_PATH=\"/fs-computility/....../${WANDB_PROJECT}/${MODEL_CPK_NAME}\"                   # Absolute path to save everything about this training run\nexport CKPT_PATH=\"${SAVE_PATH}/ckpt\"                                                                    # Path to save checkpoints                                    \nexport FINAL_CKPT_PATH=\"${SAVE_PATH}/final_ckpt\"                                                        # Path to save final checkpoints\nexport TIMESTAMP=$(date +%Y%m%d_%H%M%S)                                                                 # Timestamp\nexport CUR_LOG_DIR=\"${SAVE_PATH}/training_logs/${TIMESTAMP}\"                                            # Path to save current run logs\nexport LOG_DIR=\"${SAVE_PATH}/tb_logs\"  \n```\n⏰ Attention:\n```\nexport DEV_MODE=0 # Set to 1 for debug mode on single dev machine\n```\n\n## Evaluation\nThe integrated multimodal mathematics dataset can be downloaded from 🤗\u003ca href=\"https://huggingface.co/datasets/internlm/Spark-Data\"\u003edatasets\u003c/a\u003e and evaluated using the scripts provided in the `Evaluation` folder. The evaluation results will be stored, and accuracy can subsequently be computed with the `calculate_acc.py` file.\n```\nbash ./Evaluation/eval_spark_vl_7b.sh\npython calculate_acc.py --result_path ./your_result_path.json\n```\n\n## ✒️Citation\n```\nTBD\n```\n\n## 📄 License\n![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg) ![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg) **Usage and License Notices**: The data and code are intended and licensed for research use only.\nLicense: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use\n\n## Acknowledgement\nWe sincerely thank projects \u003ca href=\"https://github.com/TideDra/lmm-r1\"\u003elmm-r1\u003c/a\u003e and \u003ca href=\"https://github.com/OpenRLHF/OpenRLHF\"\u003eOpenRLHF\u003c/a\u003e for providing their open-source resources.\n\n\n\n\n\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finternlm%2Fspark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finternlm%2Fspark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finternlm%2Fspark/lists"}