{"id":31962156,"url":"https://github.com/agentmorris/hero-images","last_synced_at":"2025-10-14T16:36:17.421Z","repository":{"id":316456873,"uuid":"1063469771","full_name":"agentmorris/hero-images","owner":"agentmorris","description":"Finding aesthetically pleasing camera trap images in large image collections","archived":false,"fork":false,"pushed_at":"2025-10-01T23:57:15.000Z","size":102,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-02T01:23:34.034Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/agentmorris.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-24T17:06:48.000Z","updated_at":"2025-10-01T23:57:18.000Z","dependencies_parsed_at":"2025-09-24T19:16:58.002Z","dependency_job_id":"741e2b11-71cd-42ff-a4f1-108b42f57b68","html_url":"https://github.com/agentmorris/hero-images","commit_stats":null,"previous_names":["agentmorris/hero-images"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/agentmorris/hero-images","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fhero-images","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fhero-images/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fhero-images/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fhero-images/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/agentmorris","download_url":"https://codeload.github.com/agentmorris/hero-images/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fhero-images/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279019575,"owners_count":26086753,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-14T16:36:16.373Z","updated_at":"2025-10-14T16:36:17.411Z","avatar_url":"https://github.com/agentmorris.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Finding \"hero images\" in camera trap image collections\n\nThis is an exploratory project comparing methods for identifying \"hero images\" from camera trap datasets, i.e., aesthetically pleasing wildlife photos.\n\n\n## Project overview\n\nThe system will process large camera trap collections to identify candidates with aesthetic appeal through a two-stage pipeline:\n\n1. **Candidate selection**: Heuristic-based filtering using AI detection results (typically MegaDetector and SpeciesNet) to identify promising images\n2. **Labeling**: LLM aesthetic rating using Gemini 2.5 Flash or local VLMs\n\n[This page](https://lila.science/public/misc/hero-image-samples/snapshot-serengeti-by_score/) shows examples of the output of this system on a collection of images from [Snapshot Serengeti](https://lila.science/datasets/snapshot-serengeti/).\n\n\n### Scripts\n\n- **`generate_sequence_aware_candidates_optimized.py`** - Generate candidates for labeling using heuristics\n- **`gemini_labeling.py`** - Gemini API labeling (supports both batch and synchronous modes)\n- **`vllm_local_labeling.py`** - Local VLM labeling via vLLM\n- **`ollama_local_labeling.py`** - Local VLM labeling via Ollama models)\n- **`generate_label_visualization.py`** - Create HTML visualizations compatible with all labeling results\n\n\n### Modules\n\n- **`stratified_selector_sequence_aware.py`** - Main candidate selection system with sequence awareness\n\n\n## Usage\n\n### Setup\n\nInstall the package in development mode (recommended):\n\n```bash\npip install -e .\n```\n\nOr install just the requirements:\n\n```bash\npip install -r requirements.txt\n```\n\n**For Gemini labeling:** Create an API key file:\n\n```bash\necho \"your-gemini-api-key\" \u003e GEMINI_API_KEY.txt\n```\n\n**For local VLM labeling (Ollama):** Install Ollama:\n\n```bash\ncurl -fsSL https://ollama.com/install.sh | sh\n```\n\n### Select candidates for LLM labeling\n\n```bash\npython -m hero_images.generate_sequence_aware_candidates_optimized\n```\n\n### Label images\n\nAll labeling scripts accept flexible input sources for the first positional argument:\n- **Directory**: Process all images in a directory (use `--recursive` for subdirectories)\n- **Text file**: One absolute image path per line (lines starting with `#` are treated as comments)\n- **JSON file**: A JSON array containing absolute image paths\n\n#### Label images with Gemini 2.5 Flash\n\nThe Gemini labeling script supports two modes:\n\n**Batch mode (default, recommended for large jobs):**\n- 50% cost discount vs. synchronous API\n- Asynchronous processing (takes hours but runs on Google's servers)\n- Can resume/cancel jobs\n\n```bash\n# From directory\npython -m hero_images.gemini_labeling /path/to/candidates --output-dir /path/to/output --recursive\n\n# From text file with image paths\npython -m hero_images.gemini_labeling /path/to/image_list.txt --output-dir /path/to/output\n\n# From JSON file with image paths\npython -m hero_images.gemini_labeling /path/to/image_list.json --output-dir /path/to/output\n```\n\n**Synchronous mode (good for smaller jobs):**\n- 2x cost vs. batch mode\n- Real-time processing with immediate results\n- Progress updates as images are processed\n- Supports checkpointing and resume (like Ollama labeling)\n\n```bash\n# Basic synchronous processing (with automatic checkpointing every 1000 images)\npython -m hero_images.gemini_labeling /path/to/candidates --output-dir /path/to/output --recursive --sync\n\n# Resume from checkpoint if interrupted\npython -m hero_images.gemini_labeling /path/to/candidates --output-dir /path/to/output --sync --resume /path/to/output/gemini_sync_labels_YYYYMMDD_HHMMSS.tmp.json --recursive\n\n# Custom checkpoint interval\npython -m hero_images.gemini_labeling /path/to/candidates --output-dir /path/to/output --sync --checkpoint-interval 500 --recursive\n\n# Disable checkpointing for short jobs\npython -m hero_images.gemini_labeling /path/to/candidates --output-dir /path/to/output --sync --checkpoint-interval 0 --recursive\n```\n\nThe `--model` argument is optional; the default is `gemini-2.5-flash`, you can also use `gemini-2.5-pro`.\n\n#### Label images with a local VLM\n\n##### vLLM vs. ollama\n\nMy experience was precisely consistent with Internet summaries: vLLM is way faster when running the same model, but basically requires a PhD in Linux to get things working, and in some cases, I just couldn't get some models working, even when they eventually worked on Ollama.  Ollama is a little slower, but \"just works\", including much more hassle-free management of VRAM.\n\n##### Label images with vLLM\n\n```bash\n# Check GPU memory and get setup instructions\npython -m hero_images.vllm_local_labeling --setup-help\n\n# Start vLLM server (example for Qwen-2.5-VL-7B on 2x4090)\nvllm serve Qwen/Qwen2.5-VL-7B-Instruct \\\n  --host 0.0.0.0 \\\n  --port 8000 \\\n  --data-parallel-size 2 \\\n  --gpu-memory-utilization 0.9 \\\n  --max-model-len 60000\n\n# Start vLLM server (example for Gemma3-12B on 2x4090)\n#\n# Gemma requires acknowledging the license agreement, so you have to\n# sign in to Hugging Face before starting vLLM.\nhf auth login\nvllm serve google/gemma-3-12b-it \\\n    --host 0.0.0.0 \\\n    --port 8000 \\\n    --data-parallel-size 2 \\\n    --gpu-memory-utilization 0.9 \\\n    --max-model-len 5000 \\\n\t--max-num-seqs 1 \\\n\t--max-num-batched-tokens 2048\n\n# Note to self: we're using a smaller value for --max-model-len (maximum context size)\n# for Gemma because the model weights are larger, leaving less VRAM for context. Estimated\n# context size for my queries is \u003c5000 (200-300 for text, 1k-2k per image, 100-200 for response),\n# so even 32k is plenty.\n\n# Run labeling (in another terminal)\npython -m hero_images.vllm_local_labeling /path/to/candidates --output-dir /path/to/output --recursive\n```\n\n##### Label images with Ollama\n\n```bash\n# Start Ollama server (a bind error likely indicates the server is already running)\nollama serve\n\n# Pull vision model (in another terminal)\nollama pull gemma3:12b\n\n# Run labeling (with automatic checkpointing every 1000 images)\npython -m hero_images.ollama_local_labeling /path/to/candidates --output-dir /path/to/output --recursive\n\n# Resume from checkpoint if interrupted\npython -m hero_images.ollama_local_labeling /path/to/candidates --output-dir /path/to/output --resume /path/to/output/ollama_local_labels_YYYYMMDD_HHMMSS.tmp.json --recursive\n\n# Disable checkpointing for short jobs\npython -m hero_images.ollama_local_labeling /path/to/candidates --output-dir /path/to/output --checkpoint-interval 0 --recursive\n```\n\nModels to try:\n\n* gemma3:4b (3.3GB)\n* gemma3:12b (8.1GB)\n* gemma3:27b (17GB)\n\n* llava:7b (4.7GB)\n* llava:13b (8.0GB)\n* llava:34b (20GB)\n\n* qwen2.5vl:3b (3.2GB)\n* qwen2.5vl:7b (6.0GB)\n* qwen2.5vl:32b (21GB)\n* qwen2.5vl:72b (49GB)\n\nFor example:\n\n```bash\nexport MODEL_NAME=qwen2.5vl:72b\nollama pull ${MODEL_NAME}\npython -m hero_images.ollama_local_labeling /path/to/candidates --output-dir /path/to/output --model ${MODEL_NAME} --recursive\n```\n\nOther Ollama notes:\n\n* To bind separate servers to separate GPUs:\n\n```bash\n# First instance (GPU 0)\nCUDA_VISIBLE_DEVICES=0 OLLAMA_HOST=localhost:11434 ollama serve\n\n# Second instance (GPU 1)\nCUDA_VISIBLE_DEVICES=1 OLLAMA_HOST=localhost:11435 ollama serve\n```\n\n* If ollama is running as a service, kill it via `sudo systemctl stop ollama`, re-start it with `sudo systemctl start ollama`.  Disable service auto-start with `sudo systemctl disable ollama`.\n\n* Models are stored in ~/.ollama/models, unless you change the OLLAMA_MODELS environment variable.  If you run ollama via `ollama serve`, you need to set this variable in the shell where you run `ollama serve`.  If ollama is running as a service, follow [these instructions](https://github.com/ollama/ollama/issues/680#issuecomment-2880768673) to change the model download folder.\n\n* List models with `ollama list`\n\n* Remove models with `ollama rm`\n\n* With some models (not even particularly large models) I was getting timeouts during model loading.  This seems sporadic and unrelated to model size, system load, etc.  The following might help, in the shell where you're going to run `ollama serve`:\n\n```bash\nexport OLLAMA_KEEP_ALIVE=1h\nexport OLLAMA_LOAD_TIMEOUT=30m\n```\n\n#### Shared parameters for Gemini labeling\n\n- `--recursive` or `-r` - Search for images recursively in subdirectories\n- `--image-size N` - Maximum dimension for resized images (default: 768)\n- `--model MODEL` - Gemini model to use (default: gemini-2.5-flash)\n- `--auto-confirm` or `-y` - Skip cost confirmation prompt\n- `--sync` - Use synchronous API instead of batch (2x cost, real-time results)\n- `--checkpoint-interval N` - Save progress every N images in sync mode (default: 1000, use 0 to disable)\n- `--resume FILE` - Resume from checkpoint file (*.tmp.json for sync mode) or batch job metadata\n\n#### Shared parameters for local VLM labeling\n\n- `--recursive` or `-r` - Search for images recursively in subdirectories\n- `--image-size N` - Maximum dimension for resized images (default: 768)\n\n\n#### Checkpoint/resume options for local VLM labeling and Gemini sync mode\n\n- `--checkpoint-interval N` - Save progress every N images (default: 1000, use 0 to disable)\n- `--resume FILE.tmp.json` - Resume from specific checkpoint file\n- Checkpoint files are automatically cleaned up on successful completion\n- If process crashes, resume with `--resume /path/to/output/filename.tmp.json`\n- Available for: Ollama labeling, vLLM labeling, and Gemini synchronous mode\n\n\n### Visualize results\n\n**Single model visualization**\n\n```bash\npython -m hero_images.generate_label_visualization /path/to/batch_labels_file_name_20250923_143022.json\n```\n\n**Multi-model comparison dashboard**\n\n```bash\n# Generate comparison dashboard for all models in a directory\npython -m hero_images.generate_label_visualization /path/to/results_directory --sample-from /path/to/candidates --sample 100 --random-seed 42\n```\n\nNotes to self:\n\n```bash\npython -m hero_images.generate_label_visualization /mnt/c/temp/hero-images/labels/ --sample 1000 --random-seed 0 --sample-from /mnt/c/temp/hero-images/candidates/heuristics-20250923162520/\npython -m hero_images.generate_label_visualization /mnt/c/temp/hero-images/labels/ --sample 1000 --random-seed 0 --sample-from /mnt/c/temp/hero-images/candidates/heuristics-20250923162520/ --sort-by score\n```\n\nThis creates:\n- Individual HTML files for each *_labels_*.json file found\n- An index.YYYYMMDD_HHMMSS.html dashboard with links to all results\n- A single shared batch_labels_YYYYMMDD_HHMMSS_images/ folder\n\n**Visualization options**\n\n- `--sample N` - Show N randomly sampled images (default: 500)\n- `--random-seed N` - Fix random seed for reproducible sampling (default: 0)\n- `--sample-from PATH` - Sample from specific directory or JSON file for consistent comparison\n- `--sort-by {filename,score}` - Sort by filename (default) or aesthetic score\n- `--top-only` - Show only successful results (exclude failed images)\n\n\n## Gemini batch job management\n\n### Cancel a running batch job\n\nIf you need to stop a batch job (e.g., if it's taking too long or you made an error):\n\n```bash\n# When you interrupt polling with Ctrl+C, the script shows the cancel command:\npython -m hero_images.gemini_labeling --cancel batches/xyz789\n```\n\nCtrl+C only stops the local script - the job continues running on Google's servers until cancelled.\n\nCancellation is only available for batch jobs, not synchronous processing.\n\n### Resume batch jobs (running or completed)\n\nIf your script was interrupted or you want to retrieve results from a completed job:\n\n```bash\npython -m hero_images.gemini_labeling --resume /path/to/gemini_batch_metadata_YYYYMMDD_HHMMSS.json\n```\n\nResume behavior:\n\n- **Running jobs**: Continues polling until completion\n- **Completed jobs**: Immediately retrieves and saves results\n- **Failed/cancelled jobs**: Shows status and exits\n\n### Resume synchronous jobs from checkpoint\n\nSynchronous mode now supports checkpointing (like Ollama labeling):\n\n```bash\npython -m hero_images.gemini_labeling /path/to/candidates --output-dir /path/to/output --sync --resume /path/to/output/gemini_sync_labels_YYYYMMDD_HHMMSS.tmp.json --recursive\n```\n\nCheckpoint behavior:\n- Automatically saves progress every 1000 images (configurable with `--checkpoint-interval`)\n- Skips already-processed images when resuming\n- Cleans up checkpoint file on successful completion\n- Use `--checkpoint-interval 0` to disable for short jobs\n\n\n## Data pipeline\n\n```\nRaw images\n    ↓ (MegaDetector + SpeciesNet)\nDetection and classification results\n    ↓ (Sequence-aware stratified sampling)\nCandidates (N diverse images)\n    ↓ (Gemini 2.5 Flash or Local VLM Labeling)\nLabeled dataset (0-10 aesthetic scores)\n```\n\n\n## Future work\n\n- **Parallelization**: Currently most of the models I'm using via ollama don't quite max out 1 GPU, and the other is siting idle.  Allow request submission across two threads.\n- **Heuristic improvement**: Revisit sampling heuristics, which were originally designed to get a range of image quality for training, but I'm using the pipeline now with the intention of just finding good images\n- **Prompt engineering**: Try a variety of prompts, consider few-shot training, add more wildlife-specific criteria (e.g. \"eye contact with camera\", \"different species interacting\", etc.).\n- **Hyperparameter tuning**: Experiment with temperature, max_tokens\n- **VLM comparison**: Compare quality between Gemini models and local VLMs, e.g. highlighting images with significant disagreement\n- **Human labeling**: Implement Labelme integration for human validation\n\n\n## Technical notes\n\n- **Image preprocessing**: All methods resize to 768px max dimension by default\n- **Output compatibility**: All labeling scripts produce identical JSON format for visualization\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagentmorris%2Fhero-images","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagentmorris%2Fhero-images","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagentmorris%2Fhero-images/lists"}