{"id":34775130,"url":"https://github.com/paradox460/imagecaption","last_synced_at":"2025-12-25T08:14:47.644Z","repository":{"id":329319873,"uuid":"1119095825","full_name":"paradox460/imagecaption","owner":"paradox460","description":"A quick and dirty LLM powered image describing tool","archived":false,"fork":false,"pushed_at":"2025-12-19T06:54:02.000Z","size":2650,"stargazers_count":3,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-12-21T21:23:01.228Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paradox460.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-12-18T18:28:17.000Z","updated_at":"2025-12-20T17:13:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/paradox460/imagecaption","commit_stats":null,"previous_names":["paradox460/imagecaption"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/paradox460/imagecaption","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradox460%2Fimagecaption","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradox460%2Fimagecaption/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradox460%2Fimagecaption/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradox460%2Fimagecaption/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paradox460","download_url":"https://codeload.github.com/paradox460/imagecaption/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradox460%2Fimagecaption/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28024420,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-25T02:00:05.988Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-25T08:14:47.575Z","updated_at":"2025-12-25T08:14:47.635Z","avatar_url":"https://github.com/paradox460.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Imagecaption\n\n![Screenshot of the User interface](screenshot.png)\n\nA Phoenix LiveView application for batch processing JPEG images with AI-generated descriptions and keywords. The application provides a web interface for reviewing, editing, and writing EXIF metadata to image files.\n\n## Architecture\n\nThis application implements a workflow for processing collections of JPEG images:\n\n1. Directory scanning using `fd` to locate JPEG files\n2. EXIF metadata extraction via `exiftool`\n3. AI-powered caption generation using a local llama.cpp server\n4. Interactive review and editing interface\n5. EXIF metadata writing back to image files\n\nThe system prioritizes existing EXIF data over LLM generation. If an image contains both Description and Keywords EXIF fields, those values are presented for review. Otherwise, the application queries a local LLM to generate descriptions and tags.\n\n## Prerequisites\n\n### Required System Dependencies\n\n- Elixir 1.15 or later\n- Erlang/OTP 26 or later\n- `exiftool` - EXIF metadata manipulation\n- `fd` - Fast file discovery utility\n- An AI model with vision compatibility and an OpenAI-compatible API (e.g., llama.cpp with vision model)\n\n### Local LLM Server Setup\n\nThe application can be somewhat token hungry, and I recommend you use a local server, so as to not have blossoming costs from API usage. You can set up a local llama.cpp server with vision capabilities by following these general steps:\n\n1. Install `llama.cpp`. Instructions in [its github repository](https://github.com/ggml-org/llama.cpp)\n2. Run the following command to pull a vision-capable model and start the captioning server\n```\n  llama-server -hf concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf:Q6_K_L -c 8192 --port 8088\n```\n3. Set the following environment variables when starting the Phoenix server, if you deviated from the defaults:\n   - `LLM_BASE_URL` - Base URL of llama.cpp server (default: `http://localhost:8088`)\n   - `LLM_MODEL` - Model identifier (default: `Llama-Joycaption-Beta-One-Hf-Llava-Q4_K`)\n\n\n## Installation\n\n```sh\n# Clone and navigate to project directory\ncd imagecaption\n\n# Check for required system dependencies\nmix check_deps\n\n# Install dependencies\nmix setup\n\n# Start the Phoenix server\nmix phx.server\n```\n\nThe application will be available at `http://localhost:4000`.\n\n### Checking System Dependencies\n\nThe `mix check_deps` task verifies that required system tools are installed:\n\n```sh\nmix check_deps\n```\n\nThis will check for `fd` and `exiftool` and report their versions. If any dependencies are missing, the task will provide installation instructions and exit with a non-zero status code.\n\n## Configuration\n\n### LLM Configuration\n\nLLM parameters are configured via environment variables:\n\n- `LLM_BASE_URL` - Base URL of llama.cpp server (default: `http://localhost:8088`)\n- `LLM_MODEL` - Model identifier (default: `Llama-Joycaption-Beta-One-Hf-Llava-Q4_K`)\n- `LLM_MAX_TOKENS` - Maximum token count for responses (default: `300`)\n- `LLM_TEMPERATURE` - Sampling temperature (default: `0.7`)\n- `LLM_DESCRIPTION_PROMPT` - System prompt for description generation\n- `LLM_TAGS_PROMPT` - System prompt for tag generation\n\nConfiguration is loaded from `config/runtime.exs`.\n\n\n## License\n\n```\nMIT License\n\nCopyright (c) 2025 Jeff Sandberg\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is furnished\nto do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice (including the next\nparagraph) shall be included in all copies or substantial portions of the\nSoftware.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS\nFOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS\nOR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\nWHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF\nOR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparadox460%2Fimagecaption","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fparadox460%2Fimagecaption","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparadox460%2Fimagecaption/lists"}