{"id":49824138,"url":"https://github.com/padosoft/product_image_discovery","last_synced_at":"2026-05-23T14:01:04.837Z","repository":{"id":354756434,"uuid":"1224779297","full_name":"padosoft/product_image_discovery","owner":"padosoft","description":"Product Image Discovery \u0026 Verification Module - search pipeline, verification, download, scoring e publicity of product image, with Laravel, ai sdk, Sanctum, Horizon, MySQL and Redis","archived":false,"fork":false,"pushed_at":"2026-05-23T12:05:28.000Z","size":3835,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-23T14:00:24.785Z","etag":null,"topics":["ai","ai-scoring","laravel","product-search"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/padosoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/ROADMAP_SEARCH_PROVIDERS.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-29T16:08:40.000Z","updated_at":"2026-05-23T12:05:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"d988102e-973d-4d9e-883b-7bd7e7460239","html_url":"https://github.com/padosoft/product_image_discovery","commit_stats":null,"previous_names":["padosoft/product_image_discovery"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/padosoft/product_image_discovery","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padosoft%2Fproduct_image_discovery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padosoft%2Fproduct_image_discovery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padosoft%2Fproduct_image_discovery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padosoft%2Fproduct_image_discovery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/padosoft","download_url":"https://codeload.github.com/padosoft/product_image_discovery/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padosoft%2Fproduct_image_discovery/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33398391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T04:15:53.637Z","status":"ssl_error","status_checked_at":"2026-05-23T04:15:53.242Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-scoring","laravel","product-search"],"created_at":"2026-05-13T14:07:54.554Z","updated_at":"2026-05-23T14:01:04.828Z","avatar_url":"https://github.com/padosoft.png","language":"PHP","funding_links":[],"categories":["Built with Regolo"],"sub_categories":[],"readme":"# Product Image Discovery\n\n[![Latest Version on Packagist](https://img.shields.io/packagist/v/padosoft/product-image-discovery.svg?style=flat-square)](https://packagist.org/packages/padosoft/product-image-discovery)\n[![PHP](https://img.shields.io/badge/PHP-8.3%2B-777bb4.svg?style=flat-square)](https://www.php.net/)\n[![Laravel](https://img.shields.io/badge/Laravel-13.x-ff2d20.svg?style=flat-square)](https://laravel.com/)\n[![License](https://img.shields.io/packagist/l/padosoft/product-image-discovery.svg?style=flat-square)](LICENSE)\n[![Tests](https://img.shields.io/badge/tests-PHPUnit%20%2B%20Node-brightgreen.svg?style=flat-square)](#testing)\n\n![Product Image Discovery banner](resources/banner.png)\n\n## Table of Contents\n\n- [Responsible Use Disclaimer](#responsible-use-disclaimer)\n- [Why This Package](#why-this-package)\n- [Quick Start (5 minutes, junior-friendly)](#quick-start-5-minutes-junior-friendly)\n- [Supported Search Providers](#supported-search-providers)\n- [What It Does](#what-it-does)\n- [Architecture](#architecture)\n- [Web Admin UI](#web-admin-ui)\n- [Request Flow](#request-flow)\n- [Installation](#installation)\n- [Live Smoke Test From A Fresh Laravel App](#live-smoke-test-from-a-fresh-laravel-app)\n- [Debug Flow Command](#debug-flow-command)\n- [Quickstart](#quickstart)\n- [EAN / Barcode Matching](#ean--barcode-matching)\n- [Real Product Payload Examples](#real-product-payload-examples)\n- [Configuration](#configuration)\n- [Trusted Sources](#trusted-sources)\n- [Optional Playwright Sidecar](#optional-playwright-sidecar)\n- [AI And Vision](#ai-and-vision)\n- [Testing](#testing)\n- [Database Tables](#database-tables)\n- [Safety Notes](#safety-notes)\n- [Admin UI Guidance](#admin-ui-guidance)\n- [Roadmap](#roadmap)\n- [Contributing](#contributing)\n- [License \u0026 credits](#license--credits)\n\nFind the right product image, not just any image.\n\n`padosoft/product-image-discovery` is a Laravel package for discovering, verifying, scoring and preparing product images from supplier data, search providers and trusted sources. It is built for catalog teams, ERPs, PIMs and marketplaces where the expensive mistake is not \"we found no image\"; the expensive mistake is publishing the wrong image for a product-color variant.\n\nThe package gives you a conservative pipeline, an API for ingestion and review, database-backed configuration, queue-ready jobs, audit events and an optional Playwright sidecar for pages that need browser rendering.\n\n## Responsible Use Disclaimer\n\nUse this package only for lawful, authorized product image discovery workflows. Do not use it to abuse third-party services, bypass access controls, overload websites, violate robots.txt or source terms, or collect images from sources where you do not have explicit permission or another valid legal basis. Configure trusted sources, rate limits and manual review policies conservatively, and use it only on websites, suppliers, brand sources or search providers that you are allowed to access for this purpose.\n\n## Why This Package\n\n- Conservative by design: it optimizes for low false positives.\n- Product-color aware: the main identity is `client_id + erp_model_color_id`.\n- Explainable decisions: candidates carry source, score, quality and audit context.\n- Laravel native: service provider, config, migrations, Eloquent models, form requests, resources, Sanctum-friendly middleware and queue jobs.\n- Provider-ready: search providers are configured in the database and resolved through a manager.\n- Browser optional: Playwright runs in a separate Node sidecar and is not required for basic usage.\n- AI-assisted, not AI-dependent: optional LLM/vision verification can enrich decisions without making the core fragile.\n- Testable offline: the default test suite uses SQLite, fake providers and deterministic sidecar tests.\n\n## Quick Start (5 minutes, junior-friendly)\n\nGoal: from a fresh Laravel 13 app to a passing end-to-end discovery request, **without paid keys**, copy-pasting nine blocks in order. The pipeline runs synchronously and returns a stored candidate from a bundled fake provider, so you can see the full flow before plugging in any external API.\n\n\u003e Prerequisites: PHP 8.3+, Composer, a clean Laravel 13 app, ~5 minutes. No queue worker, no Redis, no Node, no API keys.\n\n**1. Require the package**\n\n```bash\ncomposer require padosoft/product-image-discovery\n```\n\n**2. Publish config**\n\n```bash\nphp artisan vendor:publish --tag=product-image-discovery-config\n```\n\n**3. Publish migrations**\n\n```bash\nphp artisan vendor:publish --tag=product-image-discovery-migrations\n```\n\n**4. Create the SQLite database file**\n\n```bash\ntouch database/database.sqlite\n```\n\nOn Windows PowerShell:\n\n```powershell\nNew-Item -ItemType File database/database.sqlite -Force\n```\n\n**5. Migrate**\n\n```bash\nphp artisan migrate\n```\n\n**6. Seed default settings + provider templates**\n\n```bash\nphp artisan db:seed --class=\"Padosoft\\ProductImageDiscovery\\Database\\Seeders\\ProductImageDiscoveryDefaultsSeeder\"\n```\n\n**7. Minimal `.env` overrides** (append four lines)\n\n```env\nDB_CONNECTION=sqlite\nDB_DATABASE=database/database.sqlite\nQUEUE_CONNECTION=sync\nPRODUCT_IMAGE_DISCOVERY_ROUTE_PREFIX=api/product-image-discovery\n```\n\n**8. Activate the bundled fake provider + issue a token via tinker**\n\n```bash\nphp artisan tinker\n```\n\nInside Tinker, paste this single block:\n\n```php\n\\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::updateOrCreate(\n    ['code' =\u003e 'fake-smoke'],\n    [\n        'name' =\u003e 'Fake Smoke Provider',\n        'driver' =\u003e 'fake',\n        'config' =\u003e [\n            'supports_image_search' =\u003e true,\n            'image_results' =\u003e [[\n                'title' =\u003e 'Demo result',\n                'page_url' =\u003e 'https://example.test/p/demo',\n                'image_url' =\u003e 'data:image/jpeg;base64,'.base64_encode(str_repeat('a', 120000)),\n                'source_domain' =\u003e 'example.test',\n                'width' =\u003e 1200,\n                'height' =\u003e 1200,\n                'provider_metadata' =\u003e [\n                    'inline_image_base64' =\u003e base64_encode(str_repeat('a', 120000)),\n                    'inline_extension' =\u003e 'jpg',\n                ],\n            ]],\n        ],\n        'priority' =\u003e 1,\n        'timeout_seconds' =\u003e 10,\n        'is_active' =\u003e true,\n    ],\n);\n\nuse Laravel\\Sanctum\\HasApiTokens;\n$user = \\App\\Models\\User::factory()-\u003ecreate(['email' =\u003e 'pid-quickstart@example.test']);\necho $user-\u003ecreateToken('pid-quickstart', ['product-image-discovery:write','product-image-discovery:read'])-\u003eplainTextToken.PHP_EOL;\n```\n\nCopy the printed token. Exit tinker.\n\n\u003e If `App\\Models\\User` does not use `Laravel\\Sanctum\\HasApiTokens`, add the trait first — see step 4 in [Live Smoke Test From A Fresh Laravel App](#live-smoke-test-from-a-fresh-laravel-app).\n\n**9. Hit the API** (replace `YOUR_TOKEN`)\n\n```bash\nphp artisan serve\n```\n\nIn another terminal:\n\n```bash\ncurl -X POST \"http://127.0.0.1:8000/api/product-image-discovery/requests\" \\\n  -H \"Authorization: Bearer YOUR_TOKEN\" \\\n  -H \"Accept: application/json\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"client_id\": 1,\n    \"erp_model_id\": \"DEMO-1\",\n    \"erp_model_color_id\": \"DEMO-1-BLACK\",\n    \"brand\": \"Demo\",\n    \"model_code\": \"DEMO-1\",\n    \"color_name\": \"Black\"\n  }'\n```\n\nYou should receive `{\"ok\":true, \"request_id\":1, \"status\":\"queued\"}`.\n\n✅ Done — you just ran the full ingest → search → extract → verify → download → quality pipeline locally with a deterministic fake provider. Next steps:\n\n- **Enable a real provider** (Brave, Tavily, Exa.ai, Firecrawl, WebSearchAPI, DuckDuckGo): see [Supported Search Providers](#supported-search-providers) below.\n- **Run the live debug flow** with a real product payload: see [Debug Flow Command](#debug-flow-command).\n- **Full host-app walkthrough** with Sanctum scopes and Brave: see [Live Smoke Test From A Fresh Laravel App](#live-smoke-test-from-a-fresh-laravel-app).\n\n## Supported Search Providers\n\nOut of the box, the package ships with **7 search providers** ready to plug in: 1 deterministic for tests + 6 live drivers covering global, EU-friendly, free-tier, and scrape-friendly options. All providers are stored in `product_image_search_providers` and resolved through a single `SearchProviderManager`, so swapping one for another is a row update.\n\n| Provider | Driver | Image search | Site filter | Free tier | Docs |\n|---|---|---|---|---|---|\n| Fake (deterministic) | `fake` | ✅ | ✅ | — | n/a — bundled for tests |\n| Brave Search | `brave` | ✅ | ✅ | 2000 / month | \u003chttps://api-dashboard.search.brave.com/app/documentation\u003e |\n| Tavily | `tavily` | ✅ | ✅ (`include_domains`) | 1000 credits / month | \u003chttps://docs.tavily.com\u003e |\n| Exa.ai | `exa` | ✅ (`extras.imageLinks`) | ✅ (`includeDomains`) | Free trial credits | \u003chttps://docs.exa.ai\u003e |\n| Firecrawl | `firecrawl` | ✅ (`sources:[\"images\"]`) | ✅ (via `site:` operator) | 500 credits / month | \u003chttps://docs.firecrawl.dev/api-reference/v2-endpoint/search\u003e |\n| WebSearchAPI.ai | `websearchapi` | ❌ (web-only) | ✅ (`includeDomains`) | Free trial credits | \u003chttps://websearchapi.ai/docs/search-api\u003e |\n| DuckDuckGo (HTML lite) | `duckduckgo` | ❌ | ✅ (via `site:` operator) | No key required | \u003chttps://duckduckgo.com/html/\u003e |\n\n\u003e Templates for `serpapi` and `google_custom_search` are seeded but not yet implemented — see [Roadmap](#roadmap).\n\n\u003e ⚠️ Provider configs are redacted in audit logs. Always store secrets in `.env` and let the seeders/tinker scripts populate `api_key_encrypted`, never expose API keys through user-facing endpoints.\n\n### Brave Search\n\n```env\nBRAVE_SEARCH_API_KEY=your-key\n```\n\nActivate the seeded provider:\n\n```php\n$p = \\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::where('code', 'brave')-\u003efirstOrFail();\n$p-\u003eapi_key_encrypted = env('BRAVE_SEARCH_API_KEY');\n$p-\u003eis_active = true;\n$p-\u003esave();\n```\n\n### Tavily\n\nJSON search API with first-class image support (`images[]`) and `include_domains` site filtering.\n\n```env\nTAVILY_API_KEY=your-key\nTAVILY_URL=https://api.tavily.com\n```\n\nActivate:\n\n```php\n$p = \\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::where('code', 'tavily')-\u003efirstOrFail();\n$p-\u003eapi_key_encrypted = env('TAVILY_API_KEY');\n$p-\u003eis_active = true;\n$p-\u003esave();\n```\n\n### Exa.ai\n\n`POST /search` with `contents.extras.imageLinks` per result, flattened into one candidate per image URL. Site filter via `includeDomains`. Auth via `x-api-key` header.\n\n```env\nEXA_API_KEY=your-key\nEXA_URL=https://api.exa.ai\n```\n\nActivate:\n\n```php\n$p = \\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::where('code', 'exa')-\u003efirstOrFail();\n$p-\u003eapi_key_encrypted = env('EXA_API_KEY');\n$p-\u003eis_active = true;\n$p-\u003esave();\n```\n\nThe `image_links_per_result` config key (default `5`) caps how many `imageLinks` Exa returns per result and how many candidates the provider emits per Exa result.\n\n### Firecrawl\n\n`POST /v2/search` with `sources:[{type:\"images\"}]` (or `[{type:\"web\"}]` for `searchWeb()`). Bearer auth. Site filter propagated as `includeDomains` array. Returns `data.images[]` with `imageUrl`, `imageWidth`, `imageHeight`, and the source page `url`.\n\n```env\nFIRECRAWL_API_KEY=your-key\nFIRECRAWL_URL=https://api.firecrawl.dev\n```\n\nActivate:\n\n```php\n$p = \\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::where('code', 'firecrawl')-\u003efirstOrFail();\n$p-\u003eapi_key_encrypted = env('FIRECRAWL_API_KEY');\n$p-\u003eis_active = true;\n$p-\u003esave();\n```\n\n### WebSearchAPI.ai\n\n`POST /ai-search` with Bearer auth. Google-backed organic web results with optional AI content extraction. Site filter via `includeDomains`. **Web-only**: WebSearchAPI does not expose a dedicated image search endpoint, so `supportsImageSearch()` returns `false` and `SearchProviderManager` skips this driver for image queries (the extraction pipeline can still harvest images from the returned pages).\n\n```env\nWEBSEARCHAPI_API_KEY=your-key\nWEBSEARCHAPI_URL=https://api.websearchapi.ai\n```\n\nActivate:\n\n```php\n$p = \\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::where('code', 'websearchapi')-\u003efirstOrFail();\n$p-\u003eapi_key_encrypted = env('WEBSEARCHAPI_API_KEY');\n$p-\u003eis_active = true;\n$p-\u003esave();\n```\n\n### DuckDuckGo (HTML lite)\n\nFree, no-API-key fallback for web search. Posts to `https://html.duckduckgo.com/html/` and parses the response with `\\DOMDocument` + `\\DOMXPath`. Result `.result__a` links use the `//duckduckgo.com/l/?uddg=\u003cencoded\u003e` redirect form — the provider decodes them transparently and returns the real destination URL. **No image search**: `supportsImageSearch()` returns `false`, so `SearchProviderManager` skips it for image queries automatically.\n\n```env\n# Optional override (defaults to https://html.duckduckgo.com):\nDUCKDUCKGO_URL=https://html.duckduckgo.com\n```\n\nActivate:\n\n```php\n$p = \\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::where('code', 'duckduckgo')-\u003efirstOrFail();\n$p-\u003ebase_url = 'https://html.duckduckgo.com';\n$p-\u003eis_active = true;\n$p-\u003esave();\n```\n\n\u003e Use it sparingly and respect DuckDuckGo's terms — it is best as a low-volume fallback, not a primary driver. DuckDuckGo applies anti-bot rate limits to shared/datacenter IPs; the live E2E test self-skips in CI and on 403/429/503 responses.\n\n### Fake provider\n\nDeterministic, no network. Use it for local smoke tests, CI and unit tests. See the [Quick Start](#quick-start-5-minutes-junior-friendly) above for an example configuration that feeds an inline base64 image directly into the download/quality steps.\n\n## What It Does\n\n- Ingests product identity payloads from ERP, PIM or catalog systems.\n- Generates targeted search queries from brand, model, SKU, supplier SKU, EAN and color.\n- Searches configurable providers.\n- Extracts image candidates from search results, structured data, Open Graph tags and gallery-like markup.\n- Deduplicates candidates by stable fingerprints.\n- Scores candidates against product identity, source trust and image quality.\n- Downloads and stores accepted candidate assets.\n- Routes uncertain matches to manual review.\n- Records audit events for decisions and retries.\n\n## Architecture\n\nThe package is split into small layers so you can replace the parts that touch infrastructure:\n\n- **API layer**: `/api/product-image-discovery/...` endpoints for request ingestion, search, candidate review and configuration.\n- **Persistence layer**: migrations and Eloquent models for requests, candidates, source pages, settings, trusted sources, providers and audit events.\n- **Pipeline layer**: queue jobs for ingest, search, extraction, verification, download and quality assessment.\n- **Search layer**: provider definitions, database repository, provider manager and provider factories.\n- **Decision layer**: deterministic scoring, anti-false-positive checks and quality thresholds.\n- **Sidecar layer**: optional Node service for rendering JavaScript-heavy product pages with Playwright.\n\n## Web Admin UI\n\nNeed a ready-made back office for this package? The professional Laravel admin is already available as a sister repository:\n\n**[`padosoft/product_image_discovery_admin`](https://github.com/padosoft/product_image_discovery_admin)**\n\nIt provides an operational dashboard, request review queues, candidate comparison, protected image previews, approve/reject/retry actions, provider and trusted-source configuration, debug-flow execution, health checks, report inspection, API workbench tooling, CSV export, saved demo filters and a GitHub Actions release gate.\n\n![Product Image Discovery Admin dashboard](resources/ProductImageSearch-dashboard.png)\n\n## Request Flow\n\n```mermaid\nflowchart TD\n    A[ERP / PIM / Catalog sends product payload] --\u003e B[POST /api/product-image-discovery/requests]\n    B --\u003e C[Validate payload with StoreProductImageDiscoveryRequest]\n    C --\u003e D[Upsert discovery request by client_id + erp_model_color_id]\n    D --\u003e E[Store full payload in raw_payload]\n    E --\u003e F[Dispatch configured ingest job]\n    F --\u003e G[IngestProductImageDiscoveryJob]\n    G --\u003e H{Payload or request id?}\n    H --\u003e|Raw payload| I[Normalize ProductIdentityData]\n    H --\u003e|Request id| J[Resume persisted request]\n    I --\u003e K[Mark request queued]\n    J --\u003e K\n    K --\u003e L[Dispatch SearchProductImageJob]\n    L --\u003e M[Generate deterministic search queries]\n    M --\u003e N[SearchProviderManager executes active providers]\n    N --\u003e O{Results found?}\n    O --\u003e|No| P[Mark no_candidates_found]\n    O --\u003e|Yes| Q[Store search context and mark candidates_found]\n    Q --\u003e R[Dispatch ExtractCandidateSourcesJob]\n    R --\u003e S[Create source pages and candidate images]\n    S --\u003e T[Deduplicate by request_id + fingerprint]\n    T --\u003e U[Dispatch VerifyCandidateImageJob]\n    U --\u003e V[Score source, text, structured data and hard rejection reasons]\n    V --\u003e W[Dispatch DownloadCandidateImageJob]\n    W --\u003e X[Download or persist inline image data]\n    X --\u003e Y[Dispatch AssessImageQualityJob]\n    Y --\u003e Z[Measure dimensions, size and quality signals]\n    Z --\u003e AA{Decision threshold}\n    AA --\u003e|Strong match| AB[Candidate quality_passed / ready for selection]\n    AA --\u003e|Uncertain| AC[Manual review]\n    AA --\u003e|Weak or unsafe| AD[Rejected with reason]\n    AB --\u003e AE[Audit event + API review endpoints]\n    AC --\u003e AE\n    AD --\u003e AE\n```\n\n## Installation\n\nRequirements:\n\n- PHP 8.3 or newer.\n- Laravel 13.\n- Composer.\n- A database supported by Laravel. SQLite is enough for a local smoke test.\n- A queue driver. `sync` is easiest for a first test; Redis/Horizon is better for production.\n\n### 1. Require the package\n\n```bash\ncomposer require padosoft/product-image-discovery\n```\n\nIf you are testing directly from GitHub before Packagist is updated, add the repository first:\n\n```bash\ncomposer config repositories.product-image-discovery vcs https://github.com/padosoft/product_image_discovery.git\ncomposer require padosoft/product-image-discovery:0.1.0\n```\n\n### 2. Review the env examples\n\nThe repository ships two examples:\n\n- `.env.example`: useful for a fresh Laravel demo app or for package development.\n- `sidecar/.env.example`: useful when running the optional Node/Playwright sidecar.\n\nFor a local smoke test, the important host-app values are:\n\n```env\nDB_CONNECTION=sqlite\nDB_DATABASE=database/database.sqlite\nQUEUE_CONNECTION=sync\nFILESYSTEM_DISK=local\nPRODUCT_IMAGE_DISCOVERY_ROUTE_PREFIX=api/product-image-discovery\nPRODUCT_IMAGE_DISCOVERY_STORAGE_DISK=local\nPRODUCT_IMAGE_DISCOVERY_DEBUG_STOP_ON_FIRST_GOOD=true\nPRODUCT_IMAGE_DISCOVERY_DEBUG_GOOD_SCORE_THRESHOLD=65\n```\n\n### 3. Publish the config\n\n```bash\nphp artisan vendor:publish --tag=product-image-discovery-config\n```\n\nThis creates:\n\n```text\nconfig/product-image-discovery.php\n```\n\n### 4. Publish the migrations\n\n```bash\nphp artisan vendor:publish --tag=product-image-discovery-migrations\n```\n\n### 5. Run migrations\n\n```bash\nphp artisan migrate\n```\n\n### 6. Seed default settings and provider templates\n\n```bash\nphp artisan db:seed --class=\"Padosoft\\ProductImageDiscovery\\Database\\Seeders\\ProductImageDiscoveryDefaultsSeeder\"\n```\n\nThe seeder creates default matching thresholds, quality settings and disabled provider templates such as Brave, SerpAPI and Google Custom Search.\n\n### 7. Configure Sanctum abilities\n\nThe API middleware expects token abilities like:\n\n```text\nproduct-image-discovery:read\nproduct-image-discovery:write\nproduct-image-discovery:review\nproduct-image-discovery:settings\nproduct-image-discovery:admin\n```\n\nFor a back-office integration, give operators `read` and `review`; give system ingestion tokens `write`; reserve `settings` and `admin` for trusted maintainers.\n\n### 8. Configure queues\n\nBy default, jobs use dedicated queue names:\n\n```php\n'queues' =\u003e [\n    'ingest' =\u003e 'image-discovery-ingest',\n    'search' =\u003e 'image-discovery-search',\n    'extract' =\u003e 'image-discovery-extract',\n    'verify' =\u003e 'image-discovery-verify',\n    'download' =\u003e 'image-discovery-download',\n    'quality' =\u003e 'image-discovery-quality',\n],\n```\n\nRun your Laravel queue workers as usual:\n\n```bash\nphp artisan queue:work\n```\n\nIf you use Horizon, map these queues in `config/horizon.php`.\n\n## Live Smoke Test From A Fresh Laravel App\n\nThis path is intentionally explicit so a junior developer can prove the package works in a real Laravel application without setting up Redis, MySQL or a paid search API.\n\n### 1. Create a clean Laravel app\n\n```bash\ncomposer create-project laravel/laravel product-image-discovery-demo \"^13.0\"\ncd product-image-discovery-demo\n```\n\n### 2. Install the package from GitHub tag `v0.1.0`\n\n```bash\ncomposer config repositories.product-image-discovery vcs https://github.com/padosoft/product_image_discovery.git\ncomposer require padosoft/product-image-discovery:0.1.0\n```\n\n### 3. Configure `.env`\n\nCreate the SQLite database file:\n\n```bash\ntouch database/database.sqlite\n```\n\nOn Windows PowerShell:\n\n```powershell\nNew-Item -ItemType File database/database.sqlite -Force\n```\n\nSet these values in the Laravel app `.env`:\n\n```env\nAPP_URL=http://127.0.0.1:8000\nDB_CONNECTION=sqlite\nDB_DATABASE=database/database.sqlite\nQUEUE_CONNECTION=sync\nFILESYSTEM_DISK=local\nPRODUCT_IMAGE_DISCOVERY_ROUTE_PREFIX=api/product-image-discovery\nPRODUCT_IMAGE_DISCOVERY_STORAGE_DISK=local\nPRODUCT_IMAGE_DISCOVERY_DEBUG_STOP_ON_FIRST_GOOD=true\nPRODUCT_IMAGE_DISCOVERY_DEBUG_GOOD_SCORE_THRESHOLD=65\n```\n\nThen generate the app key:\n\n```bash\nphp artisan key:generate\n```\n\n### 4. Install Sanctum tables and enable API tokens\n\n```bash\nphp artisan vendor:publish --provider=\"Laravel\\Sanctum\\SanctumServiceProvider\"\n```\n\nIn `app/Models/User.php`, make sure the model uses Sanctum tokens:\n\n```php\nuse Laravel\\Sanctum\\HasApiTokens;\n\nclass User extends Authenticatable\n{\n    use HasApiTokens;\n}\n```\n\nKeep any existing traits such as `HasFactory` and `Notifiable`; just add `HasApiTokens`.\n\n### 5. Publish package files and migrate\n\n```bash\nphp artisan vendor:publish --tag=product-image-discovery-config\nphp artisan vendor:publish --tag=product-image-discovery-migrations\nphp artisan migrate\nphp artisan db:seed --class=\"Padosoft\\ProductImageDiscovery\\Database\\Seeders\\ProductImageDiscoveryDefaultsSeeder\"\n```\n\n### 6. Create a test API token\n\n```bash\nphp artisan tinker\n```\n\nInside Tinker:\n\n```php\n$user = \\App\\Models\\User::factory()-\u003ecreate(['email' =\u003e 'pid-demo@example.test']);\n\n$token = $user-\u003ecreateToken('pid-demo', [\n    'product-image-discovery:read',\n    'product-image-discovery:write',\n    'product-image-discovery:review',\n    'product-image-discovery:settings',\n    'product-image-discovery:admin',\n])-\u003eplainTextToken;\n\n$token;\n```\n\nCopy the printed token for the `Authorization: Bearer ...` header.\n\n### 7. Add a deterministic fake provider\n\nThis provider lets you test the whole API and queue path without a paid search API:\n\n```bash\nphp artisan tinker\n```\n\nInside Tinker:\n\n```php\n\\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::updateOrCreate(\n    ['code' =\u003e 'fake-smoke'],\n    [\n        'name' =\u003e 'Fake Smoke Provider',\n        'driver' =\u003e 'fake',\n        'base_url' =\u003e 'https://example.test',\n        'config' =\u003e [\n            'supports_image_search' =\u003e true,\n            'supports_site_filter' =\u003e true,\n            'image_results' =\u003e [[\n                'title' =\u003e 'Nike Air Force 1 07 White White',\n                'page_url' =\u003e 'https://www.nike.com/t/air-force-1-07-mens-shoes-jBrhbr',\n                'image_url' =\u003e 'data:image/jpeg;base64,'.base64_encode(str_repeat('a', 120000)),\n                'source_domain' =\u003e 'nike.com',\n                'width' =\u003e 1200,\n                'height' =\u003e 1200,\n                'provider_metadata' =\u003e [\n                    'inline_image_base64' =\u003e base64_encode(str_repeat('a', 120000)),\n                    'inline_extension' =\u003e 'jpg',\n                ],\n            ]],\n        ],\n        'priority' =\u003e 1,\n        'timeout_seconds' =\u003e 10,\n        'is_active' =\u003e true,\n    ],\n);\n```\n\n### 8. Start the app\n\n```bash\nphp artisan serve\n```\n\n### 9. Send a real API request\n\nReplace `YOUR_TOKEN` with the Sanctum token from step 6:\n\n```bash\ncurl -X POST \"http://127.0.0.1:8000/api/product-image-discovery/requests\" \\\n  -H \"Authorization: Bearer YOUR_TOKEN\" \\\n  -H \"Accept: application/json\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"client_id\": 1,\n    \"erp_model_id\": \"NIKE-AF1-07\",\n    \"erp_model_color_id\": \"NIKE-AF1-07-CW2288-111\",\n    \"brand\": \"Nike\",\n    \"supplier\": \"Nike\",\n    \"supplier_sku\": \"CW2288-111\",\n    \"model_code\": \"Air Force 1 07\",\n    \"color_code\": \"CW2288-111\",\n    \"color_name\": \"White\",\n    \"category\": \"Sneakers\",\n    \"material\": \"Leather\"\n  }'\n```\n\nThe same payload is available as a ready-to-edit JSON file:\n\n```bash\ncurl -X POST \"http://127.0.0.1:8000/api/product-image-discovery/requests\" \\\n  -H \"Authorization: Bearer YOUR_TOKEN\" \\\n  -H \"Accept: application/json\" \\\n  -H \"Content-Type: application/json\" \\\n  --data @examples/requests/nike-air-force-1-live.json\n```\n\nYou should receive a JSON response with `ok: true` and a `request_id`. Because `QUEUE_CONNECTION=sync`, the pipeline runs during the request cycle.\n\nCheck the stored request:\n\n```bash\ncurl \"http://127.0.0.1:8000/api/product-image-discovery/requests/1\" \\\n  -H \"Authorization: Bearer YOUR_TOKEN\" \\\n  -H \"Accept: application/json\"\n```\n\n### 10. Optional: activate Brave for a real external search\n\nAdd your key to `.env`:\n\n```env\nBRAVE_SEARCH_API_KEY=your-real-key\n```\n\nThen activate the seeded Brave provider:\n\n```bash\nphp artisan tinker\n```\n\nInside Tinker:\n\n```php\n$provider = \\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::where('code', 'brave')-\u003efirstOrFail();\n$provider-\u003eapi_key_encrypted = env('BRAVE_SEARCH_API_KEY');\n$provider-\u003eis_active = true;\n$provider-\u003esave();\n```\n\nDisable the fake provider when you want only live search results:\n\n```php\n\\Padosoft\\ProductImageDiscovery\\Models\\ProductImageSearchProvider::where('code', 'fake-smoke')-\u003eupdate(['is_active' =\u003e false]);\n```\n\n## Debug Flow Command\n\nThe package includes a console command for full live debugging from a request JSON file. It runs the same pipeline jobs as the queue flow and streams a step-by-step console trace: ingest, generated queries, sites/images found, candidate verification order, every candidate examined, score components, deterministic evidence, optional AI output, download path, hash, quality analysis, errors and final decision.\n\nBefore verification, the command ranks candidates with deterministic scoring, so constrained runs start from the strongest product identity match and avoid spending live AI calls on obvious wrong-color or wrong-model candidates first. Use `--report=...` to keep the complete JSON report on disk; use `--json` when you need machine-readable output instead of the live console trace.\n\nWhere to run it:\n\n- Inside a host Laravel app that installed the package, use `php artisan ...`.\n- Inside this package repository, there is no `artisan` file. Use Orchestra Testbench through `vendor/bin/testbench`.\n\nWhat you see on screen:\n\n- ASCII art header, so debug runs are easy to spot in terminal history.\n- Request ingest: JSON file path, `client_id`, `erp_model_color_id`, brand, model and color identity.\n- Search step: provider used, generated queries, executed query attempts, query weights, result count and provider attempts.\n- Found sites and images: source domain, page URL, image URL, title and image dimensions for each provider result.\n- Extraction step: candidate ids and source pages retained from the search results.\n- Candidate plan: deterministic debug rank and the exact order in which candidates will be examined.\n- Per-candidate verification: candidate URL, source page, source policy, score components, final score, matches, mismatches, strong matches and rejection reason.\n- AI verification output when enabled: provider, model, status, match flags, confidence, brand/model/color/type/quality booleans, AI rejection reason, notes and errors.\n- Download step: selected candidate id, remote image URL, local storage path, MIME type, bytes and SHA-256 hash.\n- Quality analysis: pass/fail, quality score, dimensions, MIME type and quality issues.\n- Final decision: request status, selected/best candidate id, final score, verified match count and report path.\n- Audit events: persisted event type, level, candidate id and JSON context for later inspection.\n\n```bash\nphp artisan product-image-discovery:debug-flow examples/requests/herno-cappa-nylon-ultralight-cammello.json\n```\n\nThat `php artisan` form only works from a Laravel app root. If you run it from the package root and see `Could not open input file: artisan`, use the Testbench commands in the Herno example below.\n\nUseful options:\n\n```bash\nphp artisan product-image-discovery:debug-flow examples/requests/herno-cappa-nylon-ultralight-cammello.json \\\n  --fresh \\\n  --max-candidates=10 \\\n  --report=storage/app/product-image-discovery/debug/herno-flow.json\n```\n\n- `--fresh`: deletes the existing request for the same `client_id + erp_model_color_id` before running.\n- `--max-candidates=10`: limits how many discovered candidates are verified in this debug run.\n- `--report=...`: writes the complete JSON report to disk while still printing the formatted console output.\n- `--json`: prints only the JSON report and disables the live console trace.\n- `--no-download`: skips download and quality assessment.\n- `--download-all`: downloads and quality-assesses every verified candidate; by default only the best verified candidate is downloaded.\n- `--clean-storage`: deletes the `product-image-discovery/{request_id}` storage directory before downloading, useful when repeating debug runs.\n- `--stop-on-first-good`: stops verifying more candidates after a good verified candidate is found.\n- `--exhaustive`: verifies every candidate up to `--max-candidates`, ignoring the early-stop setting.\n- `--good-score=65`: overrides the score threshold used by early stop.\n- `--migrate`: runs migrations first, useful in local demo/Testbench environments.\n- `--no-env-brave`: disables automatic creation of a `brave-live-debug` provider from `BRAVE_SEARCH_API_KEY`.\n- `--fail-on-no-match`: exits with a failure code when no candidate reaches `verified_match`.\n\nEarly stop is controlled by:\n\n```env\nPRODUCT_IMAGE_DISCOVERY_DEBUG_STOP_ON_FIRST_GOOD=true\nPRODUCT_IMAGE_DISCOVERY_DEBUG_GOOD_SCORE_THRESHOLD=65\n```\n\nWith early stop enabled, the command stops candidate verification when a verified candidate is good enough because it comes from an auto-publish/trusted source, the source domain contains the brand, or its final score reaches the configured threshold. Use `--exhaustive` when you intentionally want to inspect all candidates up to `--max-candidates`.\n\n`--fresh` also cleans the storage directory for the matching old request ids and the new debug request id. This matters especially when running from the package with Testbench: the database is often SQLite in-memory, so request ids can restart from `1` while physical files under `vendor/orchestra/testbench-core/laravel/storage/...` remain from older debug or live test runs.\n\nTo inspect and download every verified candidate in a broad run, combine:\n\n```bash\nphp artisan product-image-discovery:debug-flow examples/requests/herno-cappa-nylon-ultralight-cammello.json \\\n  --exhaustive \\\n  --download-all \\\n  --max-candidates=10\n```\n\n### Herno Live Debug Example\n\nThe repository includes a real fashion request example without any source page or image URL:\n\n```text\nexamples/requests/herno-cappa-nylon-ultralight-cammello.json\n```\n\nIt describes:\n\n- Brand: `Herno`\n- Model/code: `PI002223D`\n- Product: `Cappa In Nylon Ultralight Cammello`\n- Color: `Cammello`\n- Category: `Donna \u003e Maglie e camicie \u003e Felpe e maglie`\n- Material: `100% Nylon`\n\nRun it in a host Laravel app:\n\n```bash\nphp artisan product-image-discovery:debug-flow examples/requests/herno-cappa-nylon-ultralight-cammello.json \\\n  --fresh \\\n  --max-candidates=10 \\\n  --report=storage/app/product-image-discovery/debug/herno-flow.json\n```\n\nRun it from this package with Testbench on Windows PowerShell:\n\n```powershell\n$env:APP_KEY = 'base64:' + [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(('a' * 32)))\n$env:DB_CONNECTION='sqlite'\n$env:DB_DATABASE=':memory:'\n$request = (Resolve-Path .\\examples\\requests\\herno-cappa-nylon-ultralight-cammello.json).Path\n$report = Join-Path (Get-Location) 'storage\\debug\\herno-flow.json'\n\u0026 'C:\\Users\\lopad\\.config\\herd\\bin\\php84\\php.exe' vendor\\bin\\testbench product-image-discovery:debug-flow $request --migrate --fresh --max-candidates=10 --report=$report\n```\n\nRun it from this package with Testbench on macOS/Linux shell:\n\n```bash\nexport APP_KEY=\"$(php -r 'echo \"base64:\".base64_encode(str_repeat(\"a\", 32));')\"\nexport DB_CONNECTION=sqlite\nexport DB_DATABASE=':memory:'\nrequest=\"$(pwd)/examples/requests/herno-cappa-nylon-ultralight-cammello.json\"\nreport=\"$(pwd)/storage/debug/herno-flow.json\"\nphp vendor/bin/testbench product-image-discovery:debug-flow \"$request\" --migrate --fresh --max-candidates=10 --report=\"$report\"\n```\n\nWith `BRAVE_SEARCH_API_KEY` configured, the command auto-creates a `brave-live-debug` provider and shows the live Brave image results. Search queries prefer product-code + color combinations before bare product-code searches, so fashion variants such as `PI002223D CAMMELLO` are tried before broader `PI002223D` searches. With AI enabled and `PRODUCT_IMAGE_DISCOVERY_AI_ATTACH_REMOTE_IMAGE=true`, it also sends each verified candidate image URL to the configured vision model and prints the full AI verification result.\n\nThe AI verifier is instructed to inspect the actual attached image first. Numeric vendor color ids in URLs or DOM metadata are not treated as color names: if the image visibly looks camel/tan/beige/cammello, the model can mark the requested color as equivalent; if the image visibly shows a different product or color, for example white shoes, it should mark `match=false`, `variant_safe=false`, `color_match=false` and `product_type_match=false`.\n\nIn the local Herno run, the trace found the official `us.herno.com` image, downloaded it under `product-image-discovery/{request_id}/{candidate_id}.jpg`, quality-checked it, printed the SHA-256 hash, and kept the request in `manual_review` because the source was not configured as auto-publishable. External results and AI wording can change, so treat the report as the source of truth for each run.\n\nDownloaded image paths:\n\n- The command prints the logical path stored on the configured Laravel disk, for example `product-image-discovery/1/4.jpg`.\n- In a host Laravel app with `PRODUCT_IMAGE_DISCOVERY_STORAGE_DISK=local`, that file is physically under the app storage directory, for example `storage/app/private/product-image-discovery/1/4.jpg`.\n- When running from this package with Testbench, the Laravel app is Testbench's skeleton app, so the physical file is under `vendor/orchestra/testbench-core/laravel/storage/app/private/product-image-discovery/1/4.jpg`.\n- If you change `PRODUCT_IMAGE_DISCOVERY_STORAGE_DISK`, inspect the root configured for that disk in the host app's `config/filesystems.php`.\n\nConsole screenshots:\n\n![Debug flow command ingest and search trace](resources/artisan-command-01.png)\n\n![Debug flow command candidate ranking and scoring trace](resources/artisan-command-02.png)\n\n## Quickstart\n\nSend a product-color payload:\n\n```bash\ncurl -X POST \"https://your-app.test/api/product-image-discovery/requests\" \\\n  -H \"Authorization: Bearer YOUR_TOKEN\" \\\n  -H \"Accept: application/json\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"client_id\": 10,\n    \"erp_model_color_id\": \"SHOE-123-BLACK\",\n    \"erp_model_id\": \"SHOE-123\",\n    \"brand\": \"Example Brand\",\n    \"supplier\": \"Main Supplier\",\n    \"sku\": \"SHOE-123-BLK-42\",\n    \"supplier_sku\": \"SUP-9988\",\n    \"model_code\": \"SHOE-123\",\n    \"color_code\": \"BLK\",\n    \"color_name\": \"Black\",\n    \"ean\": \"8050000000000\",\n    \"season\": \"FW26\",\n    \"category\": \"Sneakers\",\n    \"material\": \"Leather\"\n  }'\n```\n\nExample response:\n\n```json\n{\n  \"ok\": true,\n  \"request_id\": 1,\n  \"erp_model_color_id\": \"SHOE-123-BLACK\",\n  \"status\": \"queued\"\n}\n```\n\nSearch requests:\n\n```bash\ncurl \"https://your-app.test/api/product-image-discovery/requests/search?status=manual_review\" \\\n  -H \"Authorization: Bearer YOUR_TOKEN\" \\\n  -H \"Accept: application/json\"\n```\n\nApprove a candidate:\n\n```bash\ncurl -X POST \"https://your-app.test/api/product-image-discovery/requests/1/candidates/5/approve\" \\\n  -H \"Authorization: Bearer YOUR_TOKEN\" \\\n  -H \"Accept: application/json\"\n```\n\nReject a candidate:\n\n```bash\ncurl -X POST \"https://your-app.test/api/product-image-discovery/requests/1/candidates/5/reject\" \\\n  -H \"Authorization: Bearer YOUR_TOKEN\" \\\n  -H \"Accept: application/json\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"reason\": \"wrong_color\", \"notes\": \"The image shows the white variant.\"}'\n```\n\n## EAN / Barcode Matching\n\n`ean` is optional, but when your ERP/PIM/supplier has the real product barcode it is one of the strongest identity signals in the package.\n\nAccepted payload aliases:\n\n```text\nean\nbarcode\nbar_code\ngtin\ngtin13\ngtin14\n```\n\nThe API normalizes these aliases into the stored `ean` field. When `ean` is present:\n\n- search query generation tries the brand + EAN query first with the highest weight;\n- source patterns can use `{ean}`;\n- textual matches against the discovered page/image metadata count as a strong product identity match;\n- structured data matches against `gtin`, `gtin8`, `gtin12`, `gtin13`, `gtin14` or `ean` count as a strong match;\n- a structured GTIN/EAN mismatch is treated as a wrong-product risk;\n- even an exact EAN match does not override crucial contradictions such as wrong visible color, wrong product type, wrong brand, permission limits or low-quality image evidence.\n\nDo not invent barcodes for smoke tests. Leave `ean`/`barcode` empty unless the value comes from the real catalog, supplier or product feed.\n\n## Real Product Payload Examples\n\nThese examples are realistic ERP/PIM payloads for products that also exist on public fashion sites. The request intentionally does not include an image URL or product page URL: discovering that page/image is the job of the package. Ecommerce pages can change, go out of stock or block automated access, so treat these as smoke-test payloads rather than permanent fixtures. Do not invent EANs: leave `ean` empty unless your ERP/PIM has the real barcode.\n\nReady-to-edit request files are available in:\n\n```text\nexamples/requests/\n```\n\n- `erp-product-image-discovery-request.example.json`: generic ERP/PIM template without image/source URLs.\n- `nike-air-force-1-live.json`: concrete Nike smoke-test payload.\n- `herno-cappa-nylon-ultralight-cammello.json`: concrete Herno fashion payload for live discovery/debug flow testing.\n\n### Nike Air Force 1 07, White/White\n\nSource page: [Nike Air Force 1 07 men's shoes](https://www.nike.com/t/air-force-1-07-mens-shoes-jBrhbr)\n\n```json\n{\n  \"client_id\": 1,\n  \"erp_model_id\": \"NIKE-AF1-07\",\n  \"erp_model_color_id\": \"NIKE-AF1-07-CW2288-111\",\n  \"brand\": \"Nike\",\n  \"supplier\": \"Nike\",\n  \"supplier_sku\": \"CW2288-111\",\n  \"model_code\": \"Air Force 1 07\",\n  \"color_code\": \"CW2288-111\",\n  \"color_name\": \"White\",\n  \"category\": \"Sneakers\",\n  \"material\": \"Leather\"\n}\n```\n\n### Nike Air Force 1 07, White/White, LuisaViaRoma item\n\nSource page: [LuisaViaRoma Nike Air Force 1 07 sneakers](https://www.luisaviaroma.com/en-us/p/nike/women/82I-U3C014)\n\n```json\n{\n  \"client_id\": 1,\n  \"erp_model_id\": \"NIKE-AF1-07-WOMEN\",\n  \"erp_model_color_id\": \"LVR-82I-U3C014\",\n  \"brand\": \"Nike\",\n  \"supplier\": \"LuisaViaRoma\",\n  \"supplier_sku\": \"82I-U3C014\",\n  \"model_code\": \"Air Force 1 07\",\n  \"color_code\": \"82I-U3C014\",\n  \"color_name\": \"White\",\n  \"category\": \"Sneakers\",\n  \"material\": \"Calf leather\"\n}\n```\n\n### adidas Originals Samba OG, White/Black, LuisaViaRoma item\n\nSource page: [LuisaViaRoma adidas Originals Samba OG sneakers](https://www.luisaviaroma.com/en-us/p/adidas-originals/men/80I-T57018)\n\n```json\n{\n  \"client_id\": 1,\n  \"erp_model_id\": \"ADIDAS-SAMBA-OG\",\n  \"erp_model_color_id\": \"LVR-80I-T57018\",\n  \"brand\": \"adidas Originals\",\n  \"supplier\": \"LuisaViaRoma\",\n  \"supplier_sku\": \"80I-T57018\",\n  \"model_code\": \"Samba OG\",\n  \"color_code\": \"80I-T57018\",\n  \"color_name\": \"White/Black\",\n  \"category\": \"Sneakers\",\n  \"material\": \"Calf leather\"\n}\n```\n\n### New Balance 550, White/Grey, LuisaViaRoma item\n\nSource page: [LuisaViaRoma New Balance 550 sneakers](https://www.luisaviaroma.com/en-us/p/new-balance/men/78I-AM9016)\n\n```json\n{\n  \"client_id\": 1,\n  \"erp_model_id\": \"NEW-BALANCE-550\",\n  \"erp_model_color_id\": \"LVR-78I-AM9016\",\n  \"brand\": \"New Balance\",\n  \"supplier\": \"LuisaViaRoma\",\n  \"supplier_sku\": \"78I-AM9016\",\n  \"model_code\": \"550\",\n  \"color_code\": \"78I-AM9016\",\n  \"color_name\": \"White/Grey\",\n  \"category\": \"Sneakers\",\n  \"material\": \"Leather and synthetic\"\n}\n```\n\nAmazon is not used as a default example because product pages are highly personalized, protected and terms-sensitive. Use official brand pages or trusted fashion retailers first.\n\n## Configuration\n\nThe main config file is `config/product-image-discovery.php`.\n\nImportant options:\n\n- `route_prefix`: default `api/product-image-discovery`.\n- `route_middleware`: default `['api', 'auth:sanctum']`.\n- `abilities`: Sanctum ability names used by the package middleware.\n- `models`: override Eloquent models if your app extends package models.\n- `jobs.ingest`: override the entry job if you need custom orchestration.\n- `queues`: queue names per pipeline phase.\n- `storage.disk`: disk used for candidate assets.\n- `defaults`: search, quality and decision thresholds.\n\n## Trusted Sources\n\nTrusted source records let you prefer domains that are known to publish correct product images for a client or brand. A trusted source should improve confidence, but it should not bypass hard checks such as wrong color, wrong model, placeholder image or low-quality asset.\n\n## Optional Playwright Sidecar\n\nSome ecommerce pages render images only after JavaScript runs. The package keeps browser rendering out of PHP and delegates it to an optional Node sidecar.\n\nStart the sidecar:\n\n```bash\ncd sidecar\nnpm install\nnpm start\n```\n\nSidecar endpoints:\n\n- `GET /health`\n- `POST /render`\n\nEnvironment variables:\n\n```text\nSIDECAR_HOST=127.0.0.1\nSIDECAR_PORT=3100\nSIDECAR_SHARED_SECRET=change-me\nSIDECAR_DEFAULT_TIMEOUT_MS=15000\nSIDECAR_MAX_TIMEOUT_MS=30000\n```\n\nThe sidecar uses Playwright when available and falls back to static HTTP+HTML extraction when browser rendering is unavailable.\n\n## AI And Vision\n\nThe package includes an optional Laravel AI SDK integration for AI-assisted candidate verification. The core pipeline does not require an LLM: deterministic source/text/quality checks still run first, and AI output is stored as supporting evidence in `ai_analysis`.\n\nThis keeps local development, CI and production ingestion stable even when a model provider is unavailable.\n\nThe config defaults to Regolo through [`padosoft/laravel-ai-regolo`](https://github.com/padosoft/laravel-ai-regolo), while still supporting OpenAI, Anthropic and OpenRouter as alternate Laravel AI providers. AI verification is disabled by default, so the core pipeline remains deterministic and offline-friendly until you opt in with credentials.\n\n```env\nPRODUCT_IMAGE_DISCOVERY_AI_ENABLED=false\nPRODUCT_IMAGE_DISCOVERY_AI_PROVIDER=regolo\nPRODUCT_IMAGE_DISCOVERY_AI_TIMEOUT=45\nPRODUCT_IMAGE_DISCOVERY_AI_FAIL_SILENTLY=true\nPRODUCT_IMAGE_DISCOVERY_AI_ATTACH_REMOTE_IMAGE=false\nPRODUCT_IMAGE_DISCOVERY_AI_VISION_MODEL=\nPRODUCT_IMAGE_DISCOVERY_AI_DESCRIPTION_MODEL=Llama-3.3-70B-Instruct\nREGOLO_API_KEY=\nREGOLO_URL=https://api.regolo.ai/v1\nREGOLO_BASE_URL=\nOPENAI_API_KEY=\nOPENAI_URL=https://api.openai.com/v1\nOPENAI_BASE_URL=\nANTHROPIC_API_KEY=\nANTHROPIC_URL=https://api.anthropic.com/v1\nANTHROPIC_BASE_URL=\nOPENROUTER_API_KEY=\nOPENROUTER_URL=https://openrouter.ai/api/v1\nOPENROUTER_BASE_URL=\n```\n\nTo enable AI verification:\n\n```env\nPRODUCT_IMAGE_DISCOVERY_AI_ENABLED=true\nPRODUCT_IMAGE_DISCOVERY_AI_PROVIDER=regolo\nREGOLO_API_KEY=your-key\nREGOLO_URL=https://api.regolo.ai/v1\nPRODUCT_IMAGE_DISCOVERY_AI_DESCRIPTION_MODEL=Llama-3.3-70B-Instruct\n```\n\nFor Anthropic:\n\n```env\nPRODUCT_IMAGE_DISCOVERY_AI_ENABLED=true\nPRODUCT_IMAGE_DISCOVERY_AI_PROVIDER=anthropic\nANTHROPIC_API_KEY=your-key\nPRODUCT_IMAGE_DISCOVERY_AI_VISION_MODEL=claude-sonnet-4-5-20250929\nPRODUCT_IMAGE_DISCOVERY_AI_DESCRIPTION_MODEL=claude-haiku-4-5-20251001\n```\n\nFor OpenRouter:\n\n```env\nPRODUCT_IMAGE_DISCOVERY_AI_ENABLED=true\nPRODUCT_IMAGE_DISCOVERY_AI_PROVIDER=openrouter\nOPENROUTER_API_KEY=your-key\nOPENROUTER_URL=https://openrouter.ai/api/v1\nPRODUCT_IMAGE_DISCOVERY_AI_VISION_MODEL=your-openrouter-vision-model-id\n```\n\nBy default, remote image attachments are disabled and the verifier sends product/candidate metadata only. Set `PRODUCT_IMAGE_DISCOVERY_AI_ATTACH_REMOTE_IMAGE=true` when you want the selected provider/model to inspect the candidate image URL directly. Keep this opt-in because not every provider/model supports remote image attachments.\n\nRegolo is the package default because it gives Laravel applications an Italian/EU sovereign AI path through the same `laravel/ai` API. If you switch to Anthropic, OpenAI or OpenRouter, set model names supported by that provider.\n\n## Testing\n\nInstall PHP dependencies:\n\n```bash\ncomposer install\n```\n\nRun all PHP suites:\n\n```bash\nvendor/bin/phpunit --testsuite Unit,Feature,E2E\n```\n\nRun sidecar tests:\n\n```bash\ncd sidecar\nnpm test\n```\n\nThe current local verification used Herd PHP 8.4:\n\n```powershell\n\u0026 'C:\\Users\\lopad\\.config\\herd\\bin\\php84\\php.exe' vendor\\bin\\phpunit --testsuite Unit,Feature,E2E\n```\n\nIn a fresh offline environment, live sidecar/search/AI checks are skipped cleanly unless their credentials or URLs are provided. The current local verification with real `BRAVE_SEARCH_API_KEY`, real `ANTHROPIC_API_KEY` and remote AI image attachments enabled is:\n\n```text\n72 tests, 319 assertions, 1 skipped\n```\n\nThe skipped test is the live sidecar contract. Set `SIDECAR_E2E_URL` to test against a real running sidecar. Live search and AI checks require their provider credentials.\n\nRun the live AI verifier explicitly when you have a real Regolo, Anthropic, OpenRouter or OpenAI key in `.env`:\n\n```powershell\n\u0026 'C:\\Users\\lopad\\.config\\herd\\bin\\php84\\php.exe' vendor\\bin\\phpunit --testsuite E2E --filter LiveProductImageAiVerifierTest\n```\n\nWith `PRODUCT_IMAGE_DISCOVERY_AI_ATTACH_REMOTE_IMAGE=true`, this live test sends a real product image URL to the provider and should pass with `1 test`, `9 assertions`.\n\n## Database Tables\n\n- `product_image_discovery_requests`\n- `product_image_discovery_candidates`\n- `product_image_discovery_source_pages`\n- `product_image_discovery_settings`\n- `product_image_trusted_sources`\n- `product_image_search_providers`\n- `product_image_discovery_events`\n\n## Safety Notes\n\n- Use the package only for lawful, authorized discovery activity and only on sources you are allowed to access.\n- Respect robots.txt and source terms.\n- Prefer official supplier, brand or trusted retailer sources.\n- Do not publish images when license, ownership or product correctness is unclear.\n- Keep manual review in the flow for uncertain matches.\n- Treat watermarks, text overlays, placeholders and low-resolution images as quality risks.\n\n## Admin UI Guidance\n\nThis package stays headless. If you want to integrate a review/configuration experience inside an existing ecommerce admin, use [docs/ADMIN_UI_UX_GUIDELINES.md](docs/ADMIN_UI_UX_GUIDELINES.md). It describes the recommended vanilla JavaScript screens, components, filters, debug report viewer, guided debug-flow runner, provider credential status and API calls.\n\n## Roadmap\n\n**Recent additions (since v0.1.0):**\n\n- 6 live search providers wired through a single `SearchProviderManager`: `brave`, `tavily`, `exa`, `firecrawl`, `websearchapi`, `duckduckgo` (some land progressively across the PRs tracked in [docs/ROADMAP_SEARCH_PROVIDERS.md](docs/ROADMAP_SEARCH_PROVIDERS.md)).\n- Shared `AbstractHttpSearchProvider` so new drivers add ~80 LOC of provider-specific code.\n- GitHub Actions CI for PHP 8.3 / 8.4 + Node sidecar.\n- Junior-friendly [Quick Start](#quick-start-5-minutes-junior-friendly) that brings a fresh Laravel 13 app to a successful API response in 5 minutes without external keys.\n- Optional Regolo, Anthropic, OpenAI and OpenRouter providers for AI-assisted candidate verification.\n- EAN / GTIN / barcode alias normalization with strong-match scoring.\n\n**Planned:**\n\n- First-party SerpAPI and Google Custom Search drivers (templates already seeded).\n- Richer AI review signals while keeping deterministic checks as the publication gate.\n- Perceptual-hashing duplicate detection.\n- Image enhancement pipeline behind explicit config.\n- Host-admin UI integration examples (sister repo: [`padosoft/product_image_discovery_admin`](https://github.com/padosoft/product_image_discovery_admin)).\n- Runtime enforcement of `rate_limit_per_minute` (currently advisory only).\n\n## Contributing\n\nPull requests are welcome. Before opening one:\n\n1. Keep changes focused.\n2. Add or update tests for behavior changes.\n3. Run the PHP suite.\n4. Run the sidecar suite if you touched `sidecar/`.\n5. Update docs when behavior, configuration or architecture changes.\n\n## License \u0026 credits\n\nApache-2.0. See [LICENSE](LICENSE).\n\nSister packages in the Padosoft AI stack:\n\n- `padosoft/laravel-ai-regolo` -- first-class Regolo (Italian sovereign AI) provider for `laravel/ai`.\n- `padosoft/laravel-flow` -- saga / workflow orchestration for Laravel.\n- `padosoft/eval-harness` -- RAG + agent evaluation harness.\n- `padosoft/laravel-pii-redactor` -- PII redaction middleware for AI prompts.\n\nEach is independently usable. None requires the others. Pick what you need.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpadosoft%2Fproduct_image_discovery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpadosoft%2Fproduct_image_discovery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpadosoft%2Fproduct_image_discovery/lists"}