{"id":18266278,"url":"https://github.com/ssube/label-prompt-caption","last_synced_at":"2026-05-09T16:55:11.823Z","repository":{"id":256225876,"uuid":"854321609","full_name":"ssube/label-prompt-caption","owner":"ssube","description":null,"archived":false,"fork":false,"pushed_at":"2024-09-09T17:18:46.000Z","size":130,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-09T02:19:47.696Z","etag":null,"topics":["annotations","captioning","captioning-images","dataset","llama3","llm","vlm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ssube.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-09T00:26:42.000Z","updated_at":"2024-09-15T20:38:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"fd62e6c9-abfa-40b7-b250-538f231cf6c4","html_url":"https://github.com/ssube/label-prompt-caption","commit_stats":null,"previous_names":["ssube/label-prompt-caption"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ssube/label-prompt-caption","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssube%2Flabel-prompt-caption","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssube%2Flabel-prompt-caption/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssube%2Flabel-prompt-caption/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssube%2Flabel-prompt-caption/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ssube","download_url":"https://codeload.github.com/ssube/label-prompt-caption/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssube%2Flabel-prompt-caption/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276243191,"owners_count":25609215,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-21T02:00:07.055Z","response_time":72,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotations","captioning","captioning-images","dataset","llama3","llm","vlm"],"created_at":"2024-11-05T11:22:43.317Z","updated_at":"2025-09-21T12:37:04.570Z","avatar_url":"https://github.com/ssube.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Label-Prompt-Caption Studio\n\n- [Label-Prompt-Caption Studio](#label-prompt-caption-studio)\n  - [What](#what)\n    - [Method](#method)\n    - [Models](#models)\n  - [Why](#why)\n  - [How](#how)\n    - [Setup](#setup)\n    - [Usage](#usage)\n    - [Configuration](#configuration)\n    - [Metadata](#metadata)\n  - [TODOs](#todos)\n\n## What\n\nThis is a Gradio UI for captioning small and medium datasets containing hundreds or thousands of images using a variety\nof natural language and keyword/tag captioning models.\n\nI have prepared [a dataset of animals in hats](https://huggingface.co/datasets/ssube/animals-in-hats) that can be used\nto demonstrate the labels and the UI.\n\n### Method\n\nThe name describes the captioning method:\n\n1. Some **Labels** for critical details are applied to each image by humans or AI, such as `animal: duck` and `hat: red rain hat`\n2. A **Prompt** is created for each image using a prompt template and the image labels, such as `Describe this image of a duck wearing a red rain hat.`\n3. A **Caption** is generated by passing the prompt to Florence, Joy, or another captioning model, such as `a cartoon picture of duck wearing red rain hat. The image is a digital drawing in a cartoon style, featuring a cheerful, anthropomorphic duck character.`\n\nYou can add a prefix or suffix to the caption using [a Jinja\ntemplate](https://jinja.palletsprojects.com/en/3.0.x/templates/). Caption and prompt templates use the same syntax and\nhave access to the same image labels, except for the `{{ caption }}` variable, which is the caption returned by the\nmodel.\n\n### Models\n\nThis uses a few different ML models:\n\n- https://huggingface.co/microsoft/Florence-2-large-ft for captioning\n- https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha and https://huggingface.co/meta-llama/Meta-Llama-3.1-8B for captioning\n  - compatible with other Llama models, including ones that do not require personal information\n- https://github.com/vikhyat/moondream for captioning and question answering\n\n## Why\n\nUsing labels to describe critical details and passing those labels on to the prompt helps the captioning model to\navoid mistakes and hallucinations.\n\nMistakes in the captions can cause problems later during training, especially if large or focal details are\nmis-identified. Providing additional detail in the prompt can also help the captioning model to identify concepts it\nis not familiar with.\n\n## How\n\nThe required labels are defined for each group, along with templates for the image caption and templates for each\ncaptioning model's prompt. The image labels are used to format those templates, providing more information to the\ncaptioning models.\n\n### Setup\n\nClone this repository:\n\n```shell\n\u003e git clone git@github.com:ssube/label-prompt-caption.git\n```\n\nSet up a virtual environment:\n\n```shell\n\u003e python3 -m venv venv\n```\n\nInstall the requirements in the virtual environment:\n\n```shell\n\u003e source venv/bin/activate\n\u003e pip3 install -r requirements.txt\n```\n\n### Usage\n\nUsing the virtual environment, run the server:\n\n```shell\n\u003e source venv/bin/activate\n\u003e python3 -m lpc\n```\n\nOpen the web UI in your browser. A link to the web UI will be shown in the logs, usually http://127.0.0.1:7860/.\n\n1. On the `Dataset` tab, enter the `Base Path` for your dataset.\n   1. This is the top-level directory which contains all of the images and group sub-directories.\n2. Press the `Load Groups` button\n   1. This will scan the dataset directory for images matching the `Image Formats`\n3. Press the `View Group` button next to a group\n4. Switch to the `Group` tab\n5. Press the `Load Group` button\n   1. This will load four additional sections: `Group Captions`, `Group Prompts`, `Group Taxonomy`, and `Group Images`\n6. Provide a `Caption Template`\n   1. You can use the template to add a prefix to every caption in the group, like `picture of {{ subject }}. {{ caption }}`\n   2. The `{{ caption }}` variable will be set to the captioning model's output\n7. Provide one or more `Group Prompts`\n   1. For Florence, you can use one of `\u003cCAPTION\u003e`, `\u003cDETAILED_CAPTION\u003e`, or `\u003cMORE_DETAILED_CAPTION\u003e`\n   2. For Joy, `Write a detailed description for this image of {{ subject }}.` is a good default but you can modify\n      the prompt to include more details, the mood of the image, or any other helpful information.\n8. Add any required labels to the `Group Taxonomy`\n   1. These should include any variables in your `Caption Template` and `Group Prompts`, like `subject` in this example\n   2. You do not need to include `caption` here\n9. Select an image\n10. Switch to the `Image` tab\n11. Add annotations for any missing labels\n    1. In this example, the `subject` might be a `dog`\n12. The `Image Prompts` will show your `Group Prompts` templated with the labels and values from the image annotations\n13. Press one of the `Caption with Florence` or `Caption with Joy` buttons\n14. The `Image Caption` should update with a new caption describing your selected image\n15. Modify the caption until it accurately describes the image\n    1. The `Shuffle Phrases` button will randomly shuffle each phrase, split on commas\n    2. The `Remove Newlines` button will remove any newlines in the caption\n    3. The `Strip Partial Phrases` button will remove any text after the last `.`, in case the captioning model returned\n       an incomplete phrase at the end of the prompt\n16. Press the `Save Image Caption` button to save the caption to a `.txt` file\n\n### Configuration\n\nIf you are not comfortable sharing your contact information with Meta, you can use an alternative Llama model by\nsetting the `LPC_LLAMA_MODEL` environment variable. For example:\n\n```shell\n\u003e export LPC_LLAMA_MODEL=cognitivecomputations/dolphin-2.9.4-llama3.1-8b\n```\n\n### Metadata\n\nFor ease of editing, the metadata is stored in a `meta.yaml` file in each directory where images were found:\n\n```yaml\ngroup:\n  caption: a {{ style }} picture of {{ animal }} wearing {{ hat }}. {{ caption }}\n  prompt:\n    Florence: \u003cMORE_DETAILED_CAPTION\u003e\n    Joy: Please write a detailed description of this {{ animal }} wearing {{ hat }}.\n    Moondream: Describe this image in detail.\n  required_labels:\n  - style\n  - animal\n  - hat\nimages:\n  00092-2473709667.png:\n    annotations:\n    - bounding_box: null\n      label: animal\n      value: duck\n    - bounding_box: null\n      label: style\n      value: cartoon\n    - bounding_box: null\n      label: hat\n      value: red rain hat\n```\n\nImages are stored by filename only, relative to the dataset and group, so that directories can be moved around and\nshared without changing the metadata.\n\n## TODOs\n\n- Group captioning with batching\n- Implement the previous/next buttons\n- Switch the group/image tab after selecting a group/image\n- Group-level default labels (mark the whole directory as `style=cartoon`)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssube%2Flabel-prompt-caption","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssube%2Flabel-prompt-caption","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssube%2Flabel-prompt-caption/lists"}