{"id":18673762,"url":"https://github.com/opencsgs/csg-vl","last_synced_at":"2025-10-28T22:03:20.539Z","repository":{"id":239039479,"uuid":"798208352","full_name":"OpenCSGs/csg-vl","owner":"OpenCSGs","description":"a family of small multimodal models","archived":false,"fork":false,"pushed_at":"2024-11-18T02:58:29.000Z","size":3498,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T21:38:10.822Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenCSGs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-09T10:06:42.000Z","updated_at":"2025-01-29T21:32:58.000Z","dependencies_parsed_at":"2024-11-07T09:16:34.684Z","dependency_job_id":"a026a717-db2a-4c80-abad-83ae300dd8bb","html_url":"https://github.com/OpenCSGs/csg-vl","commit_stats":null,"previous_names":["opencsgs/csg-vl"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fcsg-vl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fcsg-vl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fcsg-vl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fcsg-vl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenCSGs","download_url":"https://codeload.github.com/OpenCSGs/csg-vl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248504287,"owners_count":21115141,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T09:16:30.861Z","updated_at":"2025-10-28T22:03:20.456Z","avatar_url":"https://github.com/OpenCSGs.png","language":"Python","readme":"# CSG-VL: A family of small multimodal models\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./icon.png\" alt=\"Logo\" width=\"500\"\u003e\n\u003c/p\u003e\n\nCSG-VL is a family of small but strong multimodal models. It offers multiple plug-and-play vision encoders, like EVA-CLIP, SigLIP and language backbones, including Wukong-1B, Llama-3-8B, Phi-1.5, StableLM-2, Qwen1.5 and Phi-2. \n\n\n## News and Updates\n* 2024.05.09 🔥 **CSG-VL is released!**\n\n## Quickstart\n\n### HuggingFace transformers\n\nHere we show a code snippet to show you how to use [CSG-VL-1B-v0.1](https://huggingface.co/opencsg/csg-wukong-1B-VL-v0.1) with HuggingFace transformers.\n\nBefore running the snippet, you need to install the following dependencies:\n\n```shell\npip install torch transformers accelerate pillow\n```\n```python\nimport torch\nimport transformers\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom PIL import Image\nimport warnings\n\n# disable some warnings\ntransformers.logging.set_verbosity_error()\ntransformers.logging.disable_progress_bar()\nwarnings.filterwarnings('ignore')\n\n# set device\ntorch.set_default_device('cpu')  # or 'cuda'\n\nmodel_name = 'opencsg/csg-wukong-1B-VL-v0.1'\n# create model\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name,\n    torch_dtype=torch.float16,\n    device_map='auto',\n    trust_remote_code=True)\ntokenizer = AutoTokenizer.from_pretrained(\n    model_name,\n    trust_remote_code=True)\n\n# text prompt\nprompt = 'What is the astronaut holding in his hand?'\ntext = f\"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: \u003cimage\u003e\\n{prompt} ASSISTANT:\"\ntext_chunks = [tokenizer(chunk).input_ids for chunk in text.split('\u003cimage\u003e')]\ninput_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)\nimage = Image.open('example_1.png')\nimage_tensor = model.process_images([image], model.config).to(dtype=model.dtype)\n\n# generate\noutput_ids = model.generate(\n    input_ids,\n    images=image_tensor,\n    max_new_tokens=100,\n    use_cache=True)[0]\n\nprint(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())\n```\n\n\n## Install\n\n* CUDA and cuDNN\n\n  We use CUDA 11.8 and cuDNN 8.7.0. We actually use the CUDA docker by NVIDIA: `docker pull nvcr.io/nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04`. CUDA 12 is fine, too.\n\n* Create a conda virtual environment and activate it:\n\n  ```shell\n  conda create -n csg-vl python=3.10\n  conda activate csg-vl\n  ```\n\n* Basic requirements\n\n  ```shell\n  pip install --upgrade pip  # enable PEP 660 support\n  pip install transformers\n  pip install torch torchvision xformers --index-url https://download.pytorch.org/whl/cu118\n  ```\n\n* Install apex\n\n  ```shell\n  # https://github.com/NVIDIA/apex#from-source\n  pip install ninja\n  git clone https://github.com/NVIDIA/apex\n  cd apex\n  # if pip \u003e= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key...\n  pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" ./\n  # otherwise\n  pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\n  ```\n\n* Install flash-attention\n\n  ```shell\n  # https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features\n  pip install packaging\n  pip install flash-attn --no-build-isolation\n  ```\n\n* Install csg-vl and other requirements\n\n  ```shell\n  git clone https://github.com/OpenCSGs/csg-vl.git\n  cd csg-vl\n  pip install -e .\n  ```\n\n\n## Demo\n\n### Gradio Web UI\n\n* Launching the Gradio Web Server\n\n  To interact with the models through a web interface, start the Gradio web server.\n\n  Basic start:\n\n  ```shell\n  python -m csg_vl.serve.gradio_web_server \\\n  \t--controller http://localhost:10000 \\\n  \t--model-list-mode reload\n  ```\n\n  If you want to share your web server with others, use `--share` option. Note that `frpc_linux_amd64_v0.2` may be missing and you can fix it following instructions printed on the screen.\n\n  ```shell\n  python -m csg_vl.serve.gradio_web_server \\\n  \t--controller http://localhost:10000 \\\n  \t--model-list-mode reload \\\n  \t--share\n  ```\n\n  Now, you can open the web interface with **the URL printed on the screen**. You may notice that there is no model in the model list. Do not worry, as we have not launched any model worker yet. It will be automatically updated when you launch a model worker.\n\n\n* Starting the Controller\n\n  First, start the controller. This service orchestrates communication between the web server and model workers.\n  \n  ```shell\n  python -m csg_vl.serve.controller \\\n  \t--host 0.0.0.0 \\\n  \t--port 10000\n  ```\n\n\n* Launching Model Workers\n\n  Model workers handle the processing of model inferences. Configure each worker with the appropriate model and start it.\n\n  * For full-parameter tuning models\n\n      ```shell\n      python -m csg_vl.serve.model_worker \\\n        --host 0.0.0.0 \\\n        --controller http://localhost:10000 \\\n        --port 40000 \\\n        --worker http://localhost:40000 \\\n        --model-path /path/to/csg-vl/model \\\n        --model-type wukong\n      ```\n\n  * For LoRA tuning models\n\n      You can use `script/merge_lora_weights.py` to merge the LoRA weights and base LLM, and use it as above.\n      \n      ```Shell\n      python script/merge_lora_weights.py \\\n        --model-path /path/to/csg_vl_lora_weights \\\n        --model-base /path/to/base_llm_model \\\n        --model-type wukong \\\n        --save-model-path /path/to/merged_model\n      ```\n      Or you can use it without merging as below.\n      \n      ```shell\n      python -m csg_vl.serve.model_worker \\\n        --host 0.0.0.0 \\\n        --controller http://localhost:10000 \\\n        --port 40000 \\\n        --worker http://localhost:40000 \\\n        --model-path /path/to/csg_vl_lora_weights \\\n        --model-base /path/to/base_llm_model \\\n        --model-type wukong\n      ```\n\n\n### CLI Inference (Without Gradio Interface)\n\nFor CLI-based inference without using the Gradio interface, use the following command:\n\n* For full-parameter tuning models\n\n  ```shell\n  python -m csg_vl.serve.cli \\\n  \t--model-path /path/to/csg-vl/model \\\n  \t--model-type wukong \\\n  \t--image-file /path/to/the/test/image\n  ```\n\n* For LoRA tuning models\n\n  You can use `script/merge_lora_weights.py` to merge the LoRA weights and base LLM, and use it as above.\n\n  ```Shell\n  python script/merge_lora_weights.py \\\n  \t--model-path /path/to/csg_vl_lora_weights \\\n  \t--model-base /path/to/base_llm_model \\\n  \t--model-type wukong \\\n  \t--save-model-path /path/to/merged_model\n  ```\n\n  Or you can use it without merging as below.\n\n  ```shell\n  python -m csg_vl.serve.cli \\\n  \t--model-path /path/to/csg_vl_lora_weights \\\n  \t--model-base /path/to/base_llm_model \\\n  \t--model-type wukong \\\n  \t--image-file /path/to/the/test/image\n  ```\n\n## License\nThis project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.\nThe content of this project itself is licensed under the [Apache license 2.0](./LICENSE).\n\n## Acknowledgement\nWe acknowledge all the open-source contributors for the following projects to make this work possible\n* [Bunny](https://github.com/BAAI-DCAI/Bunny) | [LLaVA](https://github.com/haotian-liu/LLaVA)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencsgs%2Fcsg-vl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopencsgs%2Fcsg-vl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencsgs%2Fcsg-vl/lists"}