{"id":13489998,"url":"https://github.com/cvlab-columbia/viper","last_synced_at":"2025-04-08T10:13:58.550Z","repository":{"id":143065583,"uuid":"614018941","full_name":"cvlab-columbia/viper","owner":"cvlab-columbia","description":"Code for the paper \"ViperGPT: Visual Inference via Python Execution for Reasoning\"","archived":false,"fork":false,"pushed_at":"2024-01-29T21:29:37.000Z","size":26780,"stargazers_count":1692,"open_issues_count":14,"forks_count":122,"subscribers_count":86,"default_branch":"main","last_synced_at":"2025-04-01T09:33:53.242Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cvlab-columbia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-14T18:17:22.000Z","updated_at":"2025-03-30T17:05:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"bc19aa5d-8d16-4ded-b7bd-2d95c5f9ffe1","html_url":"https://github.com/cvlab-columbia/viper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvlab-columbia%2Fviper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvlab-columbia%2Fviper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvlab-columbia%2Fviper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvlab-columbia%2Fviper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cvlab-columbia","download_url":"https://codeload.github.com/cvlab-columbia/viper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247819933,"owners_count":21001394,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T19:00:39.130Z","updated_at":"2025-04-08T10:13:58.526Z","avatar_url":"https://github.com/cvlab-columbia.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Others","[:robot: machine-learning]([robot-machine-learning)](\u003chttps://github.com/stars/ketsapiwiq/lists/robot-machine-learning\u003e))","多模态大模型"],"sub_categories":["资源传输下载"],"readme":"# ViperGPT: Visual Inference via Python Execution for Reasoning\n\nThis is the code for the paper [ViperGPT: Visual Inference via Python Execution for Reasoning](https://viper.cs.columbia.edu) by [Dídac Surís](https://www.didacsuris.com/)\\*, [Sachit Menon](https://sachit-menon.github.io/)\\* and [Carl Vondrick](https://www.cs.columbia.edu/~vondrick/).\n\n![teaser](teaser.gif \"Teaser\")\n\n## Quickstart\nClone recursively:\n```bash\ngit clone --recurse-submodules https://github.com/cvlab-columbia/viper.git\n```\n\nAfter cloning:\n```bash\ncd viper\nexport PATH=/usr/local/cuda/bin:$PATH\nbash setup.sh  # This may take a while. Make sure the vipergpt environment is active\ncd GLIP\npython setup.py clean --all build develop --user\ncd ..\necho YOUR_OPENAI_API_KEY_HERE \u003e api.key\n```\nThen you can start exploring with the `main_simple.ipynb` notebook. For running on datasets instead of individual \nexamples, use `main_batch.py` as discussed later on.\n\n\u003e :warning: WARNING: ViperGPT runs code generated by a large language model. We do not have direct control over this \n\u003e code, so it can be dangerous to run it, especially if modifications to the API are made (the current prompts do not \n\u003e have any dangerous functions like interaction with the filesystem, so it is unlikely that any malicious code can be \n\u003e generated). We cannot guarantee that the code is safe, so use at your own risk, or run in a sandboxed environment.\n\u003e For this reason, the default `execute_code` parameter in the config is `False`. Set it to `True` if you would like the \n\u003e generated code to be executed automatically in `main_batch.py`, otherwise you can execute it yourself (as in \n\u003e `main_simple.ipynb`). \n\n\n\u003e :information_source: NOTE: OpenAI discontinued support for the Codex API on March 23rd, 2023. This repository implements\n\u003e GPT-3.5 Turbo and GPT-4 as alternatives, but we have not tested them extensively; as they are chat models and not completion, their behavior likely differs.\n\n## Detailed Installation\nThe easiest way to get started exploring ViperGPT is through `main_simple.ipynb`. To run it, you will need to do the following:\n1. Clone this repository with its submodules.\n2. Install the dependencies. See the see [Dependencies](#Dependencies).\n3. Download two pretrained models (the rest are downloaded automatically). See [Pretrained models](#Pretrained-models).\n4. Set up the OpenAI key. See [OpenAI key](#OpenAI-key).\n\n### Cloning this Repo\n\n```bash\ngit clone --recurse-submodules https://github.com/cvlab-columbia/viper.git\n```\n\n### Dependencies\n\nFirst, create a conda environment using `setup_env.sh` and then install our modified version of GLIP. \nTo do so, just `cd` into the `viper` directory, and run:\n\n```bash\nexport PATH=/usr/local/cuda/bin:$PATH\nbash setup_env.sh\nconda activate vipergpt\ncd GLIP\npython setup.py clean --all build develop --user\n```\n\nPlease make sure to install GLIP as described (i.e., from our provided repo) as we have updated the CUDA kernels to be \ncompatible with newer versions of PyTorch, which are required for other models.\n\n### Pretrained models\n\nNote that ViperGPT may inherit biases from the pretrained models it uses. These biases may be reflected in the outputs \ngenerated by our model. It is recommended to consider this potential bias when using ViperGPT and interpreting its \noutputs.\n\nThis repository implements more models than the ones described in the paper, which can be useful for further research.\nMost of the implemented modules automatically download the pretrained models. However, there are four models that \nneed to be downloaded manually, if they are to be used. They have to be stored in the same directory \n`/path/to/pretrained_models`, by default `./pretrained_models/`, which has to be specified in the configuration (see [Configuration](#Configuration)).\n\nWe provide the convenience script `download_models.sh` to perform this download for you; you can set the variable $PRETRAINED_MODEL_PATH match your config's `/path/to/pretrained_models/`.\n\n#### Pretrained model system requirements\n\nMany of the models used are very large, and require quite a bit of GPU memory. In particular, GLIP and BLIP2 are especially large. Please use smaller variants of those models if running on hardware that cannot support the larger ones; however, this comes at the expense of performance.\n\n### OpenAI key\n\nTo run the OpenAI models, you will need to configure an OpenAI key. This can be done by signing up for an account [e.g. here](https://platform.openai.com/), and then creating a key in [account/api-keys](https://platform.openai.com/account/api-keys).\n**Create a file `api.key` and store the key in it.**\n\n## Running the code\n\nOnce the previous steps are done, you can run the Jupyter Notebook `main_simple.ipynb`. This notebook contains \nthe code to try ViperGPT on your own images. The notebook is well documented, and it describes how to use the code.\n\n## Dataset\n\nYou can run ViperGPT on a pre-defined set of query-image/video pairs as well. In order to do that, you will have to \ncreate a `queries.csv` file, which contains the queries and the filenames for the corresponding images/videos. The format of the file is\n`query,answer,image_name/video_name`. The answer is optional, and only needed for evaluation. See `data` for an example.\n\nYour dataset directory will contain the `queries.csv` file as well as the images/videos in the `images`/`videos` \ndirectory. Add the path to the dataset directory in the configuration (see [Configuration](#Configuration)).\n\n## Configuration\n\nAll the configuration parameters are defined in `configs/base_config.yaml`. In order to run the code,\nmodify the paths in the parameters `path_pretrained_models` and optionally `dataset.data_path` to point to the correct \ndirectories.\n\nFor every new configuration you need to run, create a new yaml file in the `configs` directory (like `my_config.yaml`), \nand modify the parameters you need to change. The parameters in the new file will overwrite \nthe ones in `base_config.yaml`. Any number of configuration files can be specified, they will be merged in the order \nthey are specified in the command line.\n\nThe `multiprocessing` parameter refers to *both* the batch (every sample is run by a different worker) and the models \n(every model runs in its own process).\n\n## Running the code on a dataset, without the Jupyter notebook\n\nThe code can be run using the following command:\n\n```bash\nCONFIG_NAMES=your_config_name python main_batch.py\n```\n\n`CONFIG_NAMES` is an environment variable that specifies the configuration files to use.\n\nIf you want to run the code using multiprocessing, set `multiprocessing: True` in the config file.\n\nIt is especially important to consider the risks of executing arbitrary code when running in a batch; in particular, if you modify the API or any inputs to Codex, be mindful to not include potentially damaging abilities such as file modification/deletion.\n\n## Code structure\n\nThe code is prepared to run in a multiprocessing manner, from two points of view. First, it runs the models in parallel,\nmeaning that each pretrained model runs in its own process. Second, it runs the samples in parallel, meaning that \nseveral workers are created to run the samples for a given batch. There is a producer-consumer queuing mechanism where \nthe processes controlling the models are the consumers of inputs coming from the workers that run each sample \n(producer). Our implementation allows for batching of samples, which means that several workers can send their inputs to\nthe same model process, which will run them as a batch, and return the output to each worker separately. \n\nThe code has comments and docstrings, but here is a brief overview of the code structure:\n- `vision_models.py`: Contains the code for the pretrained models. Each one of them is a subclass of `BaseModel`.\nImplementing a new model is easy. Just create a new class that inherits from `BaseModel` and implement the `forward` \nmethod, as well as the `name` method. The latter will be used to call the model. \n- `vision_processes.py`: Acts as a bridge between the models and the rest of the code. It contains the code for to start \nall the required processes, whether multiprocessing or not. It automatically detects all the new models implemented in\n`vision_models.py`. It defines a `forward` method that takes a name as input (as well as arguments), and calls the \nappropriate model.\n- `main_batch.py` and `main_simple.ipynb`: These are the main files to run the code. The former runs the whole dataset and \nis suited for parallel processing of samples, while the latter runs a single image/video and is suited for debugging.\n- `image_patch.py` and `video_segment.py`: These are the classes that represent the image patches and video segments.\nThey contain all the methods that call the `forward` method of `vision_processes.py` and therefore call the models.\n- `configs`: Directory containing the configuration files. The configuration files are in YAML format, and read using \nOmegaConf.\n- `datasets`: Directory containing the code for the datasets. The datasets are subclasses of `torch.utils.data.Dataset`.\n- `prompts`: Directory containing the prompts for Codex and GPT-3. The Codex ones define the API specifications.\n- `utils.py`, `useful_lists` and `base_models`: Auxiliary files containing useful functions, lists and pretrained model \nimplementations.\n\n## Citation\n\nIf you use this code, please consider citing the paper as:\n\n```\n@article{surismenon2023vipergpt,\n    title={ViperGPT: Visual Inference via Python Execution for Reasoning},\n    author={D\\'idac Sur\\'is and Sachit Menon and Carl Vondrick},\n    journal={arXiv preprint arXiv:2303.08128},\n    year={2023}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcvlab-columbia%2Fviper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcvlab-columbia%2Fviper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcvlab-columbia%2Fviper/lists"}