{"id":28412429,"url":"https://github.com/andrewliao11/longperceptualthoughts","last_synced_at":"2025-09-23T07:44:14.092Z","repository":{"id":292380674,"uuid":"957552973","full_name":"andrewliao11/LongPerceptualThoughts","owner":"andrewliao11","description":"[COLM'25] The official implementation of \"LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception\"","archived":false,"fork":false,"pushed_at":"2025-08-04T18:23:15.000Z","size":5271,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-18T05:19:24.872Z","etag":null,"topics":["computer-vision","large-language-models","reasoning","reasoning-language-models","vision-language-model","visual-reasoning"],"latest_commit_sha":null,"homepage":"https://andrewliao11.github.io/LongPerceptualThoughts/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrewliao11.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-30T16:45:20.000Z","updated_at":"2025-08-06T15:12:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"6cd989f9-cdf8-4e03-99d9-0ee43d339ec9","html_url":"https://github.com/andrewliao11/LongPerceptualThoughts","commit_stats":null,"previous_names":["andrewliao11/longperceptualthoughts"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/andrewliao11/LongPerceptualThoughts","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewliao11%2FLongPerceptualThoughts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewliao11%2FLongPerceptualThoughts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewliao11%2FLongPerceptualThoughts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewliao11%2FLongPerceptualThoughts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrewliao11","download_url":"https://codeload.github.com/andrewliao11/LongPerceptualThoughts/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewliao11%2FLongPerceptualThoughts/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276538270,"owners_count":25659932,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-23T02:00:09.130Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","large-language-models","reasoning","reasoning-language-models","vision-language-model","visual-reasoning"],"created_at":"2025-06-02T21:45:10.014Z","updated_at":"2025-09-23T07:44:14.049Z","avatar_url":"https://github.com/andrewliao11.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LongPerceptualThoughts\n\nA data engine that produces long **Chain-of-thoughts** (CoTs) data for visual reasoning. This is a joint work with Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, and David Acuna.\n\n**🎉 This paper is accepted to COLM'25. See you in Montreal**\n\n[**paper**](https://arxiv.org/abs/2504.15362) |\n[**website**](https://andrewliao11.github.io/LongPerceptualThoughts/) |\n[**dataset host on Huggingface**](https://huggingface.co/datasets/andrewliao11/LongPerceptualThoughts-30k) |\n[**checkpoints on Huggingface**](https://huggingface.co/collections/andrewliao11/longperceptualthoughts-6882358a8a6143fe5b4c5f44) |\n[**X post**](https://x.com/andrewliao11/status/1917602672493973818)\n\n![](./assets/overall_pipeline.gif)\n\n## News\n- ⭐ 2025/08/05: released checkpoints\n- ⭐ 2025/05/26: updated LLaMA-Factory version for DPO training\n- ⭐ 2025/05/23: released train and eval code \n- ⭐ 2025/05/09: released code for data generation\n- ⭐ 2025/04/21: released paper and dataset\n\n## 🔧 Usage\n\n\n### Environment setup\n\nPrerequisites\n\n1. CUDA==12.4\n2. torch==2.6.0\n3. transformers==4.53.2\n4. xformers==0.0.29.post2\n\n\nSimple environment setup\n\n```\ngit clone https://github.com/andrewliao11/LongPerceptualThoughts.git --recursive\ncd LongPerceptualThoughts/\n\nconda env create -f environment.yml -n LongPerceptualThoughts\n# or use the script to install the environment line-by-line:\n# conda create -n LongPerceptualThoughts python==3.10 -y\n# conda activate LongPerceptualThoughts\n# bash scripts/install_conda_env.sh\n```\n\nNote: Both LLaMA-Factory and vllm are actively developed open-source projecets and the code might break when there are version mismatches.\n\n\n### Evaluate our checkpoints\n\nThe following snippet will download and prepare the benchmark data in ShareGPT format. Then download our checkpoints for evaluation.\n```bash\n# 1. Prepare evaluation benchmark\nbash ./scripts/prepare_benchmark.sh\n# 2. Run evaluation using vllm and LLaMA-factory\nbash ./scripts/evaluate_lpt_checkpoints.sh\n```\n\n### Generate your own LongPerceptualThoughts\n\nWe provide a three-stage data synthesis pipeline using image-caption datasets (e.g., [google/DOCCI](https://huggingface.co/datasets/google/docci)) to generate multiple-choice questions, short CoTs and long CoTs.\n\n```bash\nexport OPENAI_API_KEY=API_KEY                                     # Model used in stage 1\nexport QWEN2_5_VL_INSTRUCT_PATH=\"/PATH/TO/QWEN2.5-VL-INSTRUCT-7B\" # Model used in stage 2\nexport R1_DISTILLED_QWEN_32_B=\"/PATH/TO/R1-DISTILLED-QWEN-32B\"    # Model used in stage 3\nbash ./scripts/generate_custom_lpt.sh\n```\n\n\n### Download pre-generated LongPerceptualThoughts and Post-train using LLaMA-Factory\n\nThe following snippet will first download pre-generated long CoTs from huggingface and run SFT or DPO using LLaMA-Factory.\n\n```bash\n# Download DOCCI and the pre-generated CoTs \nbash download_and_process_lpt_30k.sh\nexport DISABLE_VERSION_CHECK=1\nexport LLAMAFACTORY_DIR=\"LLaMA-Factory\"\n# The following training configs are for references. You may need to modify `model_name_or_path`, `template`, etc if needed.\nllamafactory-cli train config/llama_factory_sft_train_config.yaml     # SFT training\nllamafactory-cli train config/llama_factory_dpo_train_config.yaml     # DPO training\n```\n\n\n\n## 📚 Citation\n\nIf you find this repository helpful, please cite:\n\n```bibtex\n@misc{liao2025longperceptualthoughtsdistillingsystem2reasoning,\n      title={LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception}, \n      author={Yuan-Hong Liao and Sven Elflein and Liu He and Laura Leal-Taixé and Yejin Choi and Sanja Fidler and David Acuna},\n      year={2025},\n      eprint={2504.15362},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https://arxiv.org/abs/2504.15362}, \n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewliao11%2Flongperceptualthoughts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrewliao11%2Flongperceptualthoughts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewliao11%2Flongperceptualthoughts/lists"}