{"id":22937643,"url":"https://github.com/yuanze-lin/olympus","last_synced_at":"2025-04-01T19:28:07.932Z","repository":{"id":266526734,"uuid":"898567022","full_name":"yuanze-lin/Olympus","owner":"yuanze-lin","description":"The official code for \"Olympus: A Universal Task Router for Computer Vision Tasks\"","archived":false,"fork":false,"pushed_at":"2024-12-12T19:03:54.000Z","size":3098,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-12T20:20:20.073Z","etag":null,"topics":["chatbot","chatgpt","deeplearning","foundation-models","instruction-tuning","llava","llms","mllms","multi-modality","multimodal","pytorch","vision-language-model"],"latest_commit_sha":null,"homepage":"https://yuanze-lin.me/Olympus_page/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yuanze-lin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-04T16:15:54.000Z","updated_at":"2024-12-12T19:03:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"eeb7cc40-4cc3-4565-8c7b-99ebe2c9c6c1","html_url":"https://github.com/yuanze-lin/Olympus","commit_stats":null,"previous_names":["yuanze-lin/olympus"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuanze-lin%2FOlympus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuanze-lin%2FOlympus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuanze-lin%2FOlympus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuanze-lin%2FOlympus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yuanze-lin","download_url":"https://codeload.github.com/yuanze-lin/Olympus/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237645201,"owners_count":19343747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","chatgpt","deeplearning","foundation-models","instruction-tuning","llava","llms","mllms","multi-modality","multimodal","pytorch","vision-language-model"],"created_at":"2024-12-14T12:13:53.635Z","updated_at":"2025-04-01T19:28:07.925Z","avatar_url":"https://github.com/yuanze-lin.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\u003cimg src=\"https://github.com/yuanze-lin/Olympus/blob/main/asset/olympus.png\" alt=\"icon\" width=\"150\" height=\"150\" style=\"vertical-align:middle; margin-right:5px;\" /\u003e\u003c/p\u003e\n\n# Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025) \u003cbr /\u003e \n\n[![PDF](https://img.shields.io/badge/PDF-Download-orange?style=flat-square\u0026logo=adobeacrobatreader\u0026logoColor=white)](https://arxiv.org/pdf/2412.09612)\n[![arXiv](https://img.shields.io/badge/arXiv-2412.09612-b31b1b.svg)](https://arxiv.org/pdf/2412.09612) \n[![Project Page](https://img.shields.io/badge/Project%20Page-Visit%20Now-0078D4?style=flat-square\u0026logo=googlechrome\u0026logoColor=white)](https://yuanze-lin.me/Olympus_page/)\n[![Weights](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-FFD21E)](https://huggingface.co/Yuanze/Olympus)\n\nOfficial implementation of \"Olympus: A Universal Task Router for Computer Vision Tasks\" \n\n**:hearts: If you find our project is helpful for your research, please kindly give us a :star2: and cite our paper :bookmark_tabs:   : )**\n\n## :mega:  News\n- [ ] Release the code for integration with task-specific models.\n- [x] Release the training \u0026 inference code.\n- [x] Release Olympus datasets.\n- [x] Release the model of Olympus.\n\n\n## :low_brightness: Overview \n\n![image](https://github.com/yuanze-lin/Olympus/blob/main/asset/overview.png)\n\n  \n## Getting Started\n\n### :hammer_and_wrench: Environment Installation \u003ca href=\"#install\" id=\"install\"/\u003e\nTo establish the environment, just run this code in the shell:\n```\ngit clone https://github.com/yuanze-lin/Olympus.git\ncd Olympus\nconda create -n olympus python==3.10 -y\nconda activate olympus\npip install -r requirements.txt\n```\nThat will create the environment ```olympus``` we used.\n\n### Download Models \u0026 Data ###\nWe share our collected Olympus dataset as follows:\n\n| Instruction    | Link |\n|---------|------|\n| Olympus Task-wise Data | [Olympus_20tasks_all](https://drive.google.com/drive/folders/1m3FYHarVG8eg7X7cMAC5N5NBG-p0ymw8?usp=drive_link) |\n| Olympus Fine-tuning Data | [Olympus.json](https://drive.google.com/file/d/1CMLZLa6hkVN2K1ebCcJEOaFGc2cLeLQ7/view?usp=sharing) |\n\n- ```Olympus_20tasks_all```: There are 20 JSON files, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided in ```coa.json```. Each of these 21 JSON files includes both training and test data.\n- ```Olympus.json```: The final fine-tuning data.\n\n\n(1) Download the Olympus model:\n```\npython download_olympus.py\n```\nIt will save the ```Olympus``` model under the ```ckpts``` folder.\n\n(2) Download the Olympus data for fine-tuning:\n```\npython download_olympus_json.py\n```\nThe json data will be saved as ```Olympus.json``` in the ```train_data``` folder. Note that ```Olympus.json``` includes ```llava_v1_5_mix665k.json``` combined with our collected data from 20 tasks.\n\n**If you want to merge the data manually, firstly create ```jsons``` folder by ```mkdir jsons```, download all the JSON files from [Olympus_20tasks_all](https://drive.google.com/drive/folders/1m3FYHarVG8eg7X7cMAC5N5NBG-p0ymw8?usp=drive_link) and [llava_v1_5_mix665k.json](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) into the ```jsons``` folder, then run the merge script:**\n\n```\npython scripts/merge_data.py\n```\n\n(3) Download the Mipha-3B model for fine-tuning:\n```\npython download_mipha_3b.py\n```\nIt will save the ```Mipha-3B``` model under the ```ckpts``` folder.\n\n### Inference\n\nRun the following code for inference: \n```\nmodel_name=Olympus\nMODELDIR=ckpts/$model_name\n\npython predict.py \\\n  --prompt \"Generate an image of a fluffy orange cat lounging on a windowsill, \\\nwith sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \\\nNext, would it be possible to change the cat's color to white? This change will make it more eye-catching. \\\nIn the following step, produce a high-resolution 3D model based on the modified image. \\\nAt the next point, please show a video of a cat and a dog running on a playground.\" \\\n  --model-path $MODELDIR \\\n  --temperature 0 \\\n  --conv-mode v0\n```\nAlternatively, you can run ```bash predict.sh``` as we did. \n\nThe prediction should be like:\n```\nInput Prompt:  Generate an image of a fluffy orange cat lounging on a windowsill,\nwith sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.\nNext, would it be possible to change the cat's color to white? This change will make it more eye-catching.\nIn the following step, produce a high-resolution 3D model based on the modified image.\nAt the next point, please show a video of a cat and a dog running on a playground.\n\nOutput:  \u003cimage_gen\u003ea fluffy orange cat lounging on a windowsill, with sunlight streaming\nthrough the glass and casting soft shadows to create a cozy atmosphere.\u003c/image_gen\u003e\n\u003cimage_edit\u003echange the cat's color to white.\u003c/image_edit\u003e\n\u003c3D_gen_image\u003eproduce a high-resolution 3D model based on the modified image.\u003c/3D_gen_image\u003e\n\u003cvideo_gen\u003ea cat and a dog running on a playground.\u003c/video_gen\u003e\n```\nChange the ```--prompt``` to customize the input prompt as needed.\n\n### Visual Instruction Tuning\nPlease refer [here](https://github.com/haotian-liu/LLaVA/blob/9a26bd1435b4ac42c282757f2c16d34226575e96/README.md#visual-instruction-tuning) to prepare the instruction tuning data. Especially, store the images from different datasets under ```train_data``` folder.\n\nRun the following code to fine-tune the model: \n```\nbash scripts/mipha/finetune.sh\n```\n\n### Evaluation\nTo evaluate the model's performance on different benchmarks:\n\nSee [Evaluation.md](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md).\n\nPlease place the evaluation data under the ```eval``` folder. The evaluation scripts are placed under ```scripts/mipha/eval/```.\nFor example, to test the model's performance on VQAv2 dataset, simply run:\n\n```\nbash scripts/mipha/eval/vqav2.sh\n```\n\n## :crystal_ball: Suppored Capacities (Covering 20 tasks)\n\n![image](https://github.com/yuanze-lin/Olympus/blob/main/asset/capacities.png)\n\n\n## :snowboarder: Diverse Applications\n\n![image](https://github.com/yuanze-lin/Olympus/blob/main/asset/application.png)\n\n## Citation\n\nIf you find Olympus useful for your research and applications, please cite using this BibTeX:\n\n```\n@article{lin2024olympus,\n  title={Olympus: A Universal Task Router for Computer Vision Tasks},\n  author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},\n  journal={arXiv preprint arXiv:2412.09612},\n  year={2024}\n}\n```\n\n## Acknowledgement\nOur project is built upon the following foundations:\n\n- [Mipha](https://github.com/xmoanvaf/llava-phi): An impressive open-source project for lightweight vision-language assistants\n- [LLaVA](https://github.com/haotian-liu/LLaVA): A powerful open-source vision-language assistant project\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyuanze-lin%2Folympus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyuanze-lin%2Folympus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyuanze-lin%2Folympus/lists"}