https://github.com/simonw/blip-caption
Generate captions for images with Salesforce BLIP
https://github.com/simonw/blip-caption
Last synced: 11 months ago
JSON representation
Generate captions for images with Salesforce BLIP
- Host: GitHub
- URL: https://github.com/simonw/blip-caption
- Owner: simonw
- Created: 2023-09-10T05:57:22.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-07-11T09:52:49.000Z (almost 2 years ago)
- Last Synced: 2025-06-28T03:06:25.432Z (12 months ago)
- Language: Python
- Size: 4.88 KB
- Stars: 120
- Watchers: 5
- Forks: 11
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# blip-caption
[](https://pypi.org/project/blip-caption/)
[](https://github.com/simonw/blip-caption/releases)
[](https://github.com/simonw/blip-caption/actions?query=workflow%3ATest)
[](https://github.com/simonw/blip-caption/blob/main/LICENSE)
A CLI tool for generating captions for images using [Salesforce BLIP](https://huggingface.co/Salesforce/blip-image-captioning-base).
## Installation
Install this tool using `pip` or `pipx`:
```bash
pipx install blip-caption
```
The first time you use the tool it will download the model from the Hugging Face model hub.
The small model is 945MB. The large model is 1.8GB. The models will be downloaded and stored in `~/.cache/huggingface/hub/` the first time you use them.
## Usage
To generate captions for an image using the small model, run:
```bash
blip-caption IMG_5825.jpeg
```
Example output:
```
a lizard is sitting on a branch in the woods
```
To use the larger model, add `--large`:
```bash
blip-caption IMG_5825.jpeg --large
```
Example output:
```
there is a chamelon sitting on a branch in the woods
```
Here's [the image I used](https://static.simonwillison.net/static/2023/IMG_5924.jpeg):

If you pass multiple files the path to each file will be output before its caption:
```bash
blip-caption /tmp/photos/*.jpeg
/tmp/photos/IMG_2146.jpeg
a man holding a bowl of salad and laughing
/tmp/photos/IMG_0151.jpeg
a cat laying on a red blanket
```
## JSON output
The `--json` flag changes the output to look like this:
```
blip-caption /tmp/photos/*.* --json
```
```json
[{"path": "/tmp/photos/IMG_2146.jpeg", "caption": "a man holding a bowl of salad and laughing"},
{"path": "/tmp/photos/IMG_0151.jpeg", "caption": "a cat laying on a red blanket"},
{"path": "/tmp/photos/IMG_3099.MOV", "error": "cannot identify image file '/tmp/photos/IMG_3099.MOV'"}]
```
Any errors are returned as a `{"path": "...", "error": "error message"}` object.
## Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
```bash
cd blip-caption
python3 -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```bash
pip install -e '.[test]'
```
To run the tests:
```bash
pytest
```