{"id":20356045,"url":"https://github.com/project-monai/vlm","last_synced_at":"2025-03-17T16:12:50.274Z","repository":{"id":259353100,"uuid":"852098024","full_name":"Project-MONAI/VLM","owner":"Project-MONAI","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-10T13:56:25.000Z","size":60411,"stargazers_count":60,"open_issues_count":7,"forks_count":9,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-10T14:38:19.033Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Project-MONAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-04T08:12:33.000Z","updated_at":"2025-03-10T13:43:24.000Z","dependencies_parsed_at":"2025-03-10T14:41:08.403Z","dependency_job_id":null,"html_url":"https://github.com/Project-MONAI/VLM","commit_stats":null,"previous_names":["project-monai/vlm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Project-MONAI%2FVLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Project-MONAI%2FVLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Project-MONAI%2FVLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Project-MONAI%2FVLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Project-MONAI","download_url":"https://codeload.github.com/Project-MONAI/VLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244066189,"owners_count":20392406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T23:14:49.881Z","updated_at":"2025-03-17T16:12:50.268Z","avatar_url":"https://github.com/Project-MONAI.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/Project-MONAI/MONAI/dev/docs/images/MONAI-logo-color.png\" width=\"30%\"/\u003e\n\u003c/p\u003e\n\n# MONAI Vision Language Models\nThe repository provides a collection of vision language models, benchmarks, and related applications, released as part of Project [MONAI](https://monai.io) (Medical Open Network for Artificial Intelligence).\n\n## 💡 News\n\n- [2024/12/04] The arXiv version of VILA-M3 is now available [here](https://arxiv.org/abs/2411.12915).\n- [2024/10/31] We released the [VILA-M3-3B](https://huggingface.co/MONAI/Llama3-VILA-M3-3B), [VILA-M3-8B](https://huggingface.co/MONAI/Llama3-VILA-M3-8B), and [VILA-M3-13B](https://huggingface.co/MONAI/Llama3-VILA-M3-13B) checkpoints on [HuggingFace](https://huggingface.co/MONAI).\n- [2024/10/24] We presented VILA-M3 and the VLM module in MONAI at MONAI Day ([slides](./m3/docs/materials/VILA-M3_MONAI-Day_2024.pdf), [recording](https://www.youtube.com/watch?v=ApPVTuEtBjc\u0026list=PLtoSVSQ2XzyDOjOn6oDRfEMCD-m-Rm2BJ\u0026index=16))\n- [2024/10/24] Interactive [VILA-M3 Demo](https://vila-m3-demo.monai.ngc.nvidia.com/) is available online!\n\n## VILA-M3\n\n**VILA-M3** is a *vision language model* designed specifically for medical applications. \nIt focuses on addressing the unique challenges faced by general-purpose vision-language models when applied to the medical domain and integrated with existing expert segmentation and classification models.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"m3/docs/images/VILA-M3_overview_v2.png\" width=\"95%\"/\u003e\n\u003c/p\u003e\n\nFor details, see [here](m3/README.md).\n\n### Online Demo\n\nPlease visit the [VILA-M3 Demo](https://vila-m3-demo.monai.ngc.nvidia.com/) to try out a preview version of the model.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"m3/docs/images/gradio_app_ct.png\" width=\"70%\"/\u003e\n\u003c/p\u003e\n\n## Local Demo\n\n### Prerequisites\n\n#### **Recommended: Build Docker Container**\n1.  To run the demo, we recommend building a Docker container with all the requirements.\n    We use a [base image](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda) with cuda preinstalled.\n    ```bash\n    docker build --network=host --progress=plain -t monai-m3:latest -f m3/demo/Dockerfile .\n    ```\n2. Run the container\n    ```bash\n    docker run -it --rm --ipc host --gpus all --net host monai-m3:latest bash\n    ```\n    \u003e Note: If you want to load your own VILA checkpoint in the demo, you need to mount a folder using `-v \u003cyour_ckpts_dir\u003e:/data/checkpoints` in your `docker run` command.\n3. Next, follow the steps to start the [Gradio Demo](./README.md#running-the-gradio-demo).\n\n#### Alternative: Manual installation\n1. **Linux Operating System**\n\n1. **CUDA Toolkit 12.2** (with `nvcc`) for [VILA](https://github.com/NVlabs/VILA).\n\n    To verify CUDA installation, run:\n    ```bash\n    nvcc --version\n    ```\n    If CUDA is not installed, use one of the following methods:\n    - **Recommended** Use the Docker image: `nvidia/cuda:12.2.2-devel-ubuntu22.04`\n        ```bash\n        docker run -it --rm --ipc host --gpus all --net host nvidia/cuda:12.2.2-devel-ubuntu22.04 bash\n        ```\n    - **Manual Installation (not recommended)** Download the appropiate package from [NVIDIA offical page](https://developer.nvidia.com/cuda-12-2-2-download-archive)\n\n1. **Python 3.10** **Git** **Wget** and **Unzip**:\n    \n    To install these, run\n    ```bash\n    sudo apt-get update\n    sudo apt-get install -y wget python3.10 python3.10-venv python3.10-dev git unzip\n    ```\n    NOTE: The commands are tailored for the Docker image `nvidia/cuda:12.2.2-devel-ubuntu22.04`. If using a different setup, adjust the commands accordingly.\n\n1. **GPU Memory**: Ensure that the GPU has sufficient memory to run the models:\n    - **VILA-M3**: 8B: ~18GB, 13B: ~30GB\n    - **CXR**: This expert dynamically loads various [TorchXRayVision](https://github.com/mlmed/torchxrayvision) models and performs ensemble predictions. The memory requirement is roughly 1.5GB in total.\n    - **VISTA3D**: This expert model dynamically loads the [VISTA3D](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/monaitoolkit/models/monai_vista3d) model to segment a 3D-CT volume. The memory requirement is roughly 12GB, and peak memory usage can be higher, depending on the input size of the 3D volume.\n    - **BRATS**: (TBD)\n\n1. ***Setup Environment***: Clone the repository, set up the environment, and download the experts' checkpoints:\n    ```bash\n    git clone https://github.com/Project-MONAI/VLM --recursive\n    cd VLM\n    python3.10 -m venv .venv\n    source .venv/bin/activate\n    make demo_m3\n    ```\n\n### Running the Gradio Demo\n\n1. Navigate to the demo directory:\n    ```bash\n    cd m3/demo\n    ```\n\n1. Start the Gradio demo:\n    \u003e This will automatically download the default VILA-M3 checkpoint from Hugging Face.\n    ```bash\n    python gradio_m3.py\n    ```\n\n1. Alternative: Start the Gradio demo with a local checkpoint, e.g.:\n    ```bash\n    python gradio_m3.py  \\\n    --source local \\\n    --modelpath /data/checkpoints/\u003c8B-checkpoint-name\u003e \\\n    --convmode llama_3\n    ```\n\u003e For details, see the available [commmandline arguments](./m3/demo/gradio_m3.py#L855).\n\n\n#### Adding your own expert model\n- This is still a work in progress. Please refer to the [README](m3/demo/experts/README.md) for more details.\n\n## Contributing\n\nTo lint the code, please install these packages:\n\n```bash\npip install -r requirements-ci.txt\n```\n\nThen run the following command:\n\n```bash\nisort --check-only --diff .  # using the configuration in pyproject.toml\nblack . --check  # using the configuration in pyproject.toml\nruff check .  # using the configuration in ruff.toml\n```\n\nTo auto-format the code, run the following command:\n\n```bash\nisort . \u0026\u0026 black . \u0026\u0026 ruff format .\n```\n\n## References \u0026 Citation\n\nIf you find this work useful in your research, please consider citing:\n\n```bibtex\n@article{nath2024vila,\n  title={VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge},\n  author={Nath, Vishwesh and Li, Wenqi and Yang, Dong and Myronenko, Andriy and Zheng, Mingxin and Lu, Yao and Liu, Zhijian and Yin, Hongxu and Law, Yee Man and Tang, Yucheng and others},\n  journal={arXiv preprint arXiv:2411.12915},\n  year={2024}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-monai%2Fvlm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fproject-monai%2Fvlm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-monai%2Fvlm/lists"}