{"id":50791377,"url":"https://github.com/mayai-it/bandiradar","last_synced_at":"2026-06-14T15:01:32.032Z","repository":{"id":362501616,"uuid":"1258830664","full_name":"mayai-it/bandiradar","owner":"mayai-it","description":"Open-source engine that monitors Italian public funding (tenders, grants, incentives) and ranks them against a company profile — two-stage matcher, ANAC benchmarks, CLI + MCP.","archived":false,"fork":false,"pushed_at":"2026-06-12T10:59:07.000Z","size":1678,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-12T11:22:09.458Z","etag":null,"topics":["ai","cli","grants","incentivi","italy","mcp","ocds","public-procurement","python","ted","tenders"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mayai-it.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-04T00:50:10.000Z","updated_at":"2026-06-12T10:53:14.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mayai-it/bandiradar","commit_stats":null,"previous_names":["mayai-it/bandiradar"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/mayai-it/bandiradar","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayai-it%2Fbandiradar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayai-it%2Fbandiradar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayai-it%2Fbandiradar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayai-it%2Fbandiradar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mayai-it","download_url":"https://codeload.github.com/mayai-it/bandiradar/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayai-it%2Fbandiradar/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34324004,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","cli","grants","incentivi","italy","mcp","ocds","public-procurement","python","ted","tenders"],"created_at":"2026-06-12T11:03:18.069Z","updated_at":"2026-06-14T15:01:31.994Z","avatar_url":"https://github.com/mayai-it.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BandiRadar\n\n[![PyPI](https://img.shields.io/pypi/v/bandiradar.svg)](https://pypi.org/project/bandiradar/)\n[![CI](https://github.com/mayai-it/bandiradar/actions/workflows/ci.yml/badge.svg)](https://github.com/mayai-it/bandiradar/actions/workflows/ci.yml)\n[![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n\n\u003e Open-source engine that monitors Italian public funding opportunities\n\u003e (public tenders, grants, incentives), normalizes them into **one canonical\n\u003e model**, and ranks them against a company profile with a two-stage matcher.\n\u003e One normalized feed of **OPEN Italian tenders** (incl. sub-threshold gare) **+\n\u003e incentives**, behind a crawl that **repairs itself** when a portal drifts.\n\n**Runs offline, zero secrets · 9 live key-less sources + 5 LLM-assisted scrapers · includes live OPEN Italian tenders (incl. sub-threshold) · optional LLM Stage-2 · MIT**\n\n## Coverage\n\n\u003e **[Coverage map](docs/coverage-map.md)** — an honest map of where Italian public\n\u003e funding is published and what BandiRadar covers: open feeds vs gated, with the\n\u003e honest gap.\n\n[![Italian public funding — data coverage map](docs/coverage-map.svg)](docs/coverage-map.md)\n\n## Features\n\n- **Two-stage matcher** — a deterministic prefilter + LLM relevance scoring, with\n  a **zero-secrets offline heuristic fallback** (the LLM is optional).\n- **9 live, key-less sources** — TED (EU), incentivi.gov.it (national), `anac_pvl`\n  (national open tenders), and the regions Lombardia, Lazio, **Sicilia**, **Emilia-\n  Romagna** and **Trento (FEASR)**; plus ANAC OCDS as a key-less **historical /\n  awarded-contracts** feed (analysis, not open calls). **5 LLM-assisted scrapers**\n  for API-less portals — Toscana, **Veneto**, **Piemonte**, **Puglia** and\n  **Sardegna** (live fetch needs an LLM key; `--sample` replays a recorded\n  extraction offline).\n- **Live OPEN Italian tenders** (`anac_pvl`) — the national *Pubblicità a Valore\n  Legale* feed of open, biddable gare, **incl. sub-threshold** ones TED never lists,\n  **no credentials** — the biddable feed the other sources lack.\n- **Self-healing crawl** — when a scraper's listing drifts, an LLM **re-derives the\n  crawl recipe** (data, not code); it's adopted **only if it exactly reproduces the\n  last-good results**, otherwise human-flagged — never silently. Demonstrated on\n  Toscana.\n- **ANAC historical-benchmark enrichment** — value/volume/seasonality context per\n  CPV division, optionally attached to matches.\n- **Document enrichment (PDF/OCR)** — optionally pull attachment PDFs into the\n  matcher so it reads the real requirements, not just title + CPV.\n- **`watch` monitor loop** (new/amended deltas) + **JSON/RSS export**.\n- **CLI + MCP server** — drive it from a shell or from an AI agent.\n- **Fully offline on `--sample`** — every demo and the whole test suite run with\n  no network and no secrets.\n\n## Table of contents\n\n- [Quickstart](#quickstart)\n- [Works across company types](#works-across-company-types)\n- [How it works](#how-it-works)\n- [Stage 2: LLM scoring](#stage-2-llm-scoring)\n- [Sources](#sources)\n- [Self-healing crawl](#self-healing-crawl)\n- [Intelligence and benchmarks](#intelligence-and-benchmarks)\n- [Document enrichment (PDF/OCR)](#document-enrichment-pdfocr)\n- [Watch and export](#watch-and-export)\n- [AI agents (MCP)](#ai-agents-mcp)\n- [Status](#status)\n- [Open core vs Pro](#open-core-vs-pro)\n- [Roadmap](#roadmap)\n- [Contributing](#contributing)\n- [License](#license) · [Data and licenses](#data-and-licenses)\n\n## Quickstart\n\n30 seconds, offline, no keys:\n\n```bash\npip install bandiradar\nbandiradar match --profile mayai --sample\n```\n\nOr from a source checkout:\n\n```bash\nuv sync\nuv run bandiradar match --profile mayai --sample\n```\n\nReal output on the bundled sample data:\n\n```text\n4 matching opportunities for 'MayAI':\n\n#1  score 55  [open]  Manifestazione d'interesse per l'accesso ai servizi per la digitalizzazione forniti da SoE AP EDIH\n     issuer: Ministero delle Imprese e del Made in Italy (Campania)   deadline: 2026-06-30\n     why: capability overlap: artificiale, digitalizzazione, intelligenza; within profile value range; national scope\n     https://www.medisdih.it/wp/\n\n#2  score 52  [open]  Voucher Digitalizzazione PMI 2025\n     issuer: LazioInnova (Lazio)   deadline: —\n     why: capability overlap: cloud, conforme, dati, digitalizzazione, machine; region match: Lazio\n     https://www.lazioinnova.it/bandi/voucher-digitalizzazione-pmi-2025/\n\n#3  score 44  [open]  Italia – Servizi di gestione dati – SERVIZIO DI GESTIONE … COMUNE DI ROCCA IMPERIALE (CS)\n     issuer: CENTRALE UNICA DI COMMITTENZA … CASSANO ALL'IONIO E TREBISACCE (—)   deadline: —\n     why: CPV prefix match (depth 2); capability overlap: dati; eu scope\n     https://ted.europa.eu/en/notice/-/detail/376324-2026\n\n#4  score 42  [closing_soon]  Donne e Impresa 2026\n     issuer: LazioInnova (Lazio)   deadline: 2026-06-10\n     why: capability overlap: data, software; region match: Lazio\n     risk: deadline closing soon\n     https://www.lazioinnova.it/bandi/donne-e-impresa-2026/\n```\n\n`--profile` accepts either a **bundled example name** (`mayai`,\n`medtech_lombardia`, `pmi_toscana`, … — packaged in the wheel, so the demos work\nfrom a `pip install` too) or a **path** to your own profile YAML.\n\nAdd `--json` for machine-readable output. Live opportunities come from the\n9 key-less sources (incentivi, TED, `anac_pvl` open tenders, and the regions\nLombardia, Lazio, Sicilia, Emilia-Romagna, Trentino); `anac` adds historical\nawarded-contract data (see [Sources](#sources) and [Status](#status)).\n\n## Works across company types\n\nBandiRadar isn't tuned to one company — it runs any profile against every source.\n`bandiradar batch` runs the bundled profile suite and compares results. Real\noutput on `--sample` (offline heuristic):\n\n```text\nPROFILE                          #  TOP MATCH (score)                      BY SOURCE\n------------------------------------------------------------------------------------\nConsulenza Strategica S.r.l.    11  Avviso Trasformazioni - Servizi… (55)  incentivi:5 lazio:5 ted:1\nCostruzioni Lombarde S.r.l.      2  LAVORI DI FORMAZIONE MANUTENZIO… (56)  lombardia:1 ted:1\nTrattoria \u0026 Bottega S.r.l.       4  Manifestazione d'interesse per … (55)  incentivi:2 lazio:2\nManifattura Esempio S.r.l.       2  Voucher 3I - Investire in innov… (36)  incentivi:2\nMayAI                            4  Manifestazione d'interesse per … (55)  incentivi:1 lazio:2 ted:1\nMedForniture Lombardia S.r.l.    2  FORNITURA DI DISPOSITIVI PER EN… (76)  lombardia:2\nInnova Toscana S.r.l.           12  Bando 1.3.2 - Sostegno alle PMI… (60)  incentivi:3 ted:2 toscana:7\nStudio Associato Commercialis…   8  Efficienza energetica e rinnova… (52)  incentivi:2 lazio:5 ted:1\n```\n\nThe suite spans distinct Italian SME segments — AI/software (MayAI),\nmanufacturing, medical-devices (Lombardy), accounting, construction,\nhospitality/retail (keyword-driven, no CPV), and consultancy. Counts are real\nmatches on the tiny bundled sample; a segment can legitimately show few hits when\nthe sample doesn't cover it. Keyword/capability overlap ignores a curated list of\ngeneric procurement filler (`lavori`, `servizi`, `fornitura`, `manutenzione`, …),\nso matches reflect *sector-bearing* terms rather than boilerplate.\n\n```bash\nuv run bandiradar batch --sample              # human comparison table\nuv run bandiradar batch --sample --json       # machine-readable\nuv run bandiradar batch --sample --csv out.csv\n```\n\nWith an LLM key the same table gets sharper scores and ranking — see\n[Stage 2: LLM scoring](#stage-2-llm-scoring).\n\n## How it works\n\n```\n        ┌─────────┐   ┌───────────┐   ┌────────┐   ┌────────┐   ┌──────────┐\n sources│ INGEST  │──▶│ NORMALIZE │──▶│ STORE  │──▶│ MATCH  │──▶│ DELIVER  │\n        └─────────┘   └───────────┘   └────────┘   └────────┘   └──────────┘\n          fetch        raw→canonical   sqlite       2 stages     cli/mcp\n                                      + dedupe                  (dashboard=pro)\n```\n\nItalian public funding is scattered across dozens of fragmented sources.\nBandiRadar pulls opportunities from those sources, maps each into a single\ncanonical `Opportunity` model, and surfaces the few that matter for a given\ncompany — with reasons and deadlines. Matching is **two stages**:\n\n1. **Deterministic prefilter** — a pure, explainable function (region/geo, value\n   range, deadline, exclusions, and a relevance signal: the opportunity's CPV\n   codes prefix-matched against the profile's `cpv_interests`, or a keyword\n   overlap). Cuts thousands of rows to dozens. No LLM, no network.\n2. **LLM relevance** — scores the survivors `0–100` with reasons, matched\n   capabilities, eligibility flags, and risk notes. It ships with a **zero-secrets\n   offline fallback** (a deterministic heuristic), so the whole thing runs in CI\n   and in agent dev loops without any API key.\n\nA thin `core` service layer orchestrates the pipeline; the CLI and MCP server are\nshells over it with no business logic. Storage is stdlib SQLite with **change\ndetection**: a changed `content_hash` bumps the version, marks the row\n`amended`, and makes it re-notifiable (a tender *rettifica* should re-notify).\nSee [`ARCHITECTURE.md`](ARCHITECTURE.md) for the full design.\n\n## Stage 2: LLM scoring\n\nStage 2 is **off by default** (zero secrets → deterministic offline heuristic).\nTo enable real LLM relevance scoring:\n\n```bash\nuv sync --extra anthropic        # or: --extra openai  (optional SDKs)\n# in .env (gitignored):\n#   BANDIRADAR_LLM_PROVIDER=anthropic        # or openai\n#   ANTHROPIC_API_KEY=sk-ant-...             # or OPENAI_API_KEY=...\n#   BANDIRADAR_LLM_MODEL=...                 # optional; defaults to a cheap Haiku-class model\n```\n\n`.env` is auto-loaded — no manual `export`. With **no key** (or no SDK), the\nengine falls back to the heuristic, so CI and offline runs need nothing. When the\nLLM path is active you'll see a one-time `scoring via anthropic:\u003cmodel\u003e` on stderr.\n\nThe LLM is more discriminating than the heuristic — same prefiltered set, sharper\nscores/ranking (`bandiradar batch --sample`):\n\n```text\n                                  heuristic        LLM (anthropic, Haiku)\nCostruzioni  (real construction)    56               92   ← genuine match promoted\nCostruzioni  (IT doc-digitization)  (kept ~36)       15   ← cross-sector match demoted\nMayAI        top match            software-licenses  ML/data tender (88)\nStudio comm. software-licenses      50               25   ← weak fit penalized\nMedForniture medical devices        76               92   ← strong sector fit held\n```\n\n## Matching quality (measured)\n\nMost matching repos ask you to trust them. This one ships the numbers. On a\n**labelled gold set of 312 real opportunities × 8 company profiles**\n(`src/bandiradar/data/eval/`), here is the matcher quality — reproduce it any time\nwith `bandiradar eval --diagnostics` (offline for the heuristic; set an LLM key for\nthe LLM column):\n\n```text\nmin_score sweep — precision@5 / precision@10 / recall / false-positive-rate / returned\n                 P@5   P@10  recall  FPR    returned\nHEURISTIC (offline, zero-secret)\n  recall  (0)   0.34  0.20   0.87   0.29     99      ← the firehose\n  balanced(20)  0.34  0.20   0.87   0.29     99      ← scores too coarse to move\n  precision(40) 0.53  0.39   0.64   0.24     82\n  (60)          0.25  0.25   0.03   0.00      2      ← collapses; no usable cut\nLLM pointwise (anthropic Haiku)\n  recall  (0)   0.37  0.24   0.87   0.29     99      ← the firehose\n  balanced(20)  0.46  0.37   0.78   0.11     51\n  precision(40) 0.73  0.68   0.45   0.03     26      ← the operating point\n  (60)          0.81  0.80   0.41   0.02     20\n```\n\n**Read it:** with an LLM, raising the cutoff cleanly trades recall for precision —\nat `precision` (min_score ≥ 40) **P@5 0.73 / P@10 0.68 / FPR 0.03**, roughly double\nthe precision of the unfiltered firehose (**P@5 0.37 / FPR 0.29**) while still\nholding ~half the recall. The **offline heuristic** is a genuine zero-secret\nfallback (P@5 0.34) but its scores are too coarse to threshold — it has no usable\nprecision cut (it collapses to 2 items at 60). So **the LLM is the matcher to ship**,\nand precision modes are meaningful **only with a key**; keyless runs are\nrecall-oriented whatever the mode.\n\n### Operating-point modes\n\n`match` / `watch` / `batch` (and the MCP `search_opportunities`) take a `--mode`:\n\n| mode | cutoff | with an LLM key | use it for |\n|------|--------|-----------------|------------|\n| `precision` | `min_score ≥ 40` | P@5 0.73, P@10 0.68, FPR 0.03 | a tight shortlist |\n| `balanced` *(default)* | `min_score ≥ 20` | P@5 0.46, recall 0.78 | day-to-day |\n| `recall` | everything prefiltered | recall 0.87 | the monitor's safety net |\n\n`--min-score N` still works for power users (it overrides `--mode`).\n\n### Honest limits (also measured — `eval --diagnostics`)\n\n- **Embeddings** (semantic prefilter, the `embeddings` extra) are **built and\n  measured but net-negative** at the current recall ceiling: ~+0.02 recall for a\n  1.2–2.7× larger candidate set and higher FPR, so they ship **optional and off**.\n- **Recall ceiling is real.** Gate attribution shows the few relevant items the\n  prefilter drops are **4/6 correctly-closed bandi** (the deadline gate is right —\n  expired calls shouldn't surface) and only **2** a lexical gap; no over-strict gate\n  to tune.\n- **Listwise reranking** (`eval --rerank`) is an **optional cheaper top-k mode**\n  (one LLM call/profile vs N) that lifts top-k slightly but loses the calibrated\n  thresholding — so pointwise stays the default.\n\n## Sources\n\n| Source | What it delivers | Live fetch |\n|---|---|---|\n| **`incentivi`** | incentivi.gov.it (MIMIT) — the national catalogue of **business incentives / grants** (`kind=\"incentive\"`), national and regional. The grant side, and the source a digital SME profile actually matches. | ✅ Wired — the official IODL open-data export, no API key. |\n| **`ted`** | TED — Tenders Electronic Daily, the EU's portal for **above-threshold, OPEN, biddable tenders** (includes large Italian public tenders). | ✅ Wired — anonymous, no API key. |\n| **`anac_pvl`** | ANAC *Pubblicità a Valore Legale* — the national feed of **OPEN Italian public tenders** (`kind=\"tender\"`), incl. **sub-threshold** ones TED never lists; notices stay online until their deadline. This is the live open-calls feed the others lack. Carries buyer, oggetto, CIG, importo (sparse), CPV, region. CPV labels are resolved to official **8-digit CPV codes** (EU vocabulary; often coarse division-level); region is resolved province→comune(ISTAT)→buyer→national. *Caveats:* importo often absent; CPV codes can be coarse. | ✅ Wired — public JSON API, **no credentials**; keeps only still-open gare (deadline in the future). |\n| **`lombardia`** | Regione Lombardia — **regional / sub-threshold** public tenders (`kind=\"tender\"`), from the *Osservatorio Regionale* (Socrata SODA). Carries CPV, value, and province. | ✅ Wired — Socrata SODA, no API key. |\n| **`lazio`** | Regione Lazio — **regional business incentives** (`kind=\"incentive\"`), from the LazioInnova bandi portal (WordPress REST API). The source the MayAI dogfood profile matches. | ✅ Wired — WP REST, no API key. |\n| **`toscana`** | Regione Toscana — **regional business incentives** (`kind=\"incentive\"`), from the Sviluppo Toscana bandi portal. First **LLM-assisted scraper**: the portal has no field API, so an LLM extracts the canonical fields from each bando page. | ⚠️ Wired — live `fetch()` **needs an LLM key**; fields are extracted from the portal's HTML bando pages. `--sample` replays a recorded extraction offline. |\n| **`veneto`** | Regione del Veneto — **regional bandi** (tenders + incentives, classified per atto) from the SIU portal. **LLM scraper**: the portal's JSON layer stonewalls bots, so the server-rendered landing seeds the crawl and an LLM extracts each `Dettaglio` page. *Honest scope:* one visit surfaces the landing's ~10 atti; the daily monitor accumulates them. | ⚠️ Wired — live `fetch()` **needs an LLM key**; `--sample` replays a recorded extraction offline. |\n| **`piemonte`** | Regione Piemonte — **regional bandi** from the dedicated Drupal portal (`bandi.regione.piemonte.it`). **LLM scraper** seeded by the server-rendered Views listing filtered to **stato \"Aperto\"** server-side; an LLM extracts each detail page. | ⚠️ Wired — live `fetch()` **needs an LLM key**; `--sample` replays a recorded extraction offline. |\n| **`puglia`** | Regione Puglia — **PR 2021-2027 avvisi** from `pr2127.regione.puglia.it`. **LLM scraper** seeded by the portal's Liferay news-list fragment, keeping only items badged **\"Bando aperto\"**. (The historic sistema.puglia.it is a frameset service registry with no scadenze — not viable.) | ⚠️ Wired — live `fetch()` **needs an LLM key**; `--sample` replays a recorded extraction offline. |\n| **`sardegna`** | Regione Sardegna — **regional agevolazioni** from Sardegna Impresa (Drupal). **LLM scraper** seeded by the server-rendered `/it/agevolazioni` Views listing (structured per-item scadenza). | ⚠️ Wired — live `fetch()` **needs an LLM key**; `--sample` replays a recorded extraction offline. |\n| **`sicilia`** | Regione Siciliana — **regional FESR/FSC incentives** (`kind=\"incentive\"`), from EuroInfoSicilia. Standard WordPress posts under the \"Bandi e Avvisi\" category (config over the shared WP base + a `categories` filter). | ✅ Wired — WP REST, no API key. |\n| **`emilia_romagna`** | Regione Emilia-Romagna — **regional incentives** (`kind=\"incentive\"`) from the Politiche territoriali portal. Plone `Bando` content type with a **structured `scadenza_bando` deadline** (no text-parsing). | ✅ Wired — plone.restapi `@search`, no API key. |\n| **`trentino`** | Provincia Autonoma di Trento — **FEASR rural-development incentives** (`kind=\"incentive\"`), from a dati.trentino.it CKAN open-data CSV (carries currently-open bandi, with importo and open/close dates). | ✅ Wired — CKAN CSV, no API key. |\n| **`anac`** | ANAC / PNCP open-contracting (OCDS) data — **historical / awarded contracts** (\u003e €40k, monthly), not open calls. Surfaces mostly-**closed** opportunities (the matcher drops them); its value is market/history analysis. Region is absent in the data → `national`. | ✅ Wired — streams the Open Contracting mirror (CC BY 4.0, no API key), **capped** at 500 releases/run. |\n\n```bash\nuv run bandiradar fetch --source incentivi --sample   # offline, bundled real capture\nuv run bandiradar match --profile mayai --source incentivi --sample\nuv run bandiradar match --profile mayai --source ted --sample\n```\n\nThe `--sample` fixtures are **real captures** (`data/fixtures/*.json`).\n`incentivi` exercises the canonical superset on the grant side — no CPV, a funding\nrange, and an eligibility text the matcher reads. TED carries above-threshold\ncontracts often far larger than a micro-SME's range, so a small profile matches\nonly the few that fit — which is why incentive/national/regional sources matter too.\n\nA regional example (the bundled `medtech_lombardia` example profile, a Lombardy\nmedical-devices distributor) matches open Lombardy tenders, while the Lazio-only\nMayAI profile correctly drops them — regional filtering in action:\n\n```bash\nuv run bandiradar match --profile medtech_lombardia --source lombardia --sample\n# -\u003e 3 open medical-device tenders (region match, CPV 33*, within value range)\nuv run bandiradar match --profile mayai --source lombardia --sample\n# -\u003e No matching opportunities (Lazio profile, Lombardy bandi dropped on region)\n```\n\nAnd the dogfood closes the loop — MayAI **is** a Lazio company, and `lazio`\n(LazioInnova) is where it finally matches its own region:\n\n```bash\nuv run bandiradar match --profile mayai --source lazio --sample\n# -\u003e Voucher Digitalizzazione PMI 2025 (52), Donne e Impresa 2026 (42, closing soon)\n#    region match: Lazio; overlap: digitalizzazione, software, cloud, dati\n```\n\nSource data licensing is consolidated under [Data and licenses](#data-and-licenses).\n\n### Regional coverage\n\nTwo **reusable bases** make a sizeable share of Italian regions config-only:\n`WordPressBandiSource` (`sources/wordpress.py`) for WP-REST portals — Lazio\n(LazioInnova) and **Sicilia** (EuroInfoSicilia, standard posts + a `categories`\nfilter) are configs over it — and `PloneBandoSource` (`sources/plone.py`) for the\nmany PAs running Plone with the AGID `Bando` content type, where a **structured\n`scadenza_bando`** beats text-parsing — **Emilia-Romagna** is the reference config.\nOpen-data tables get a dedicated adapter (Socrata for `lombardia`, a CKAN CSV for\n**`trentino`** FEASR). Each is a config/adapter + a fixture + a test, not core code.\n\nHonestly, though, that clean pattern is **rare** — most Italian regional agency\nportals are bespoke sites with no public open-bandi API, so each new region\nusually needs its own adapter (CKAN/Socrata like `lombardia`, or HTML scraping)\nrather than a one-line config. We don't ship half-working adapters: a portal\nthat's unreachable, retrospective-only, or API-less is skipped, not faked. The\nper-region status (what's been checked, where coverage is needed) lives in\n[`docs/regions.md`](docs/regions.md) — **regional contributions are very welcome.**\n\nFor those API-less portals, `toscana` is the reference **LLM-assisted scraper**:\n`fetch()` lists each bando's detail page, fetches the HTML, and an LLM extracts the\ncanonical fields (title, deadline, eligibility, amounts, keywords), cached per URL.\nThat extraction is I/O, so it lives in `fetch()` and **needs an LLM key** —\n`to_opportunities` stays pure, and `--sample` replays a recorded extraction with\n**zero secrets**:\n\n```bash\nuv run bandiradar fetch --source toscana --sample   # offline, recorded extraction\nuv run bandiradar match --profile pmi_toscana --source toscana --sample\n# -\u003e Bando Energia Imprese (92), Bando 1.3.2 Sostegno alle PMI/BEI (82),\n#    Bando Energia Immobili Imprese (78); public-only Energia Pubblico dropped to 15\n```\n\n## Self-healing crawl\n\n![A scraper's listing drifts and the crawl re-derives its own recipe — offline](docs/self-heal.gif)\n\nA scraper's fragile part is the **crawl** (the listing it depends on), not the\nextraction — the LLM already adapts to changed HTML. So the crawl is **data, not\ncode**: a `CrawlRecipe` (where the listing is + dotted paths to each field). That\nmakes drift detectable and the fix machine-checkable:\n\n1. **Spine** — every healthy crawl validates its results and **snapshots the\n   last-good ones** (the *golden*). A drift (renamed/moved fields → unusable refs)\n   is detected deterministically, not by a crash.\n2. **Healer** — on drift, an LLM is shown one live listing item and the broken\n   recipe, and asked to re-derive **only the paths** (data, never code).\n3. **Guard** — the candidate recipe is **adopted only if it exactly reproduces the\n   golden**. If it parses but differs (content genuinely changed) or stays broken,\n   the recipe is left untouched and the source is **flagged for a human** — never a\n   silent swap. Adoptions are auditable (`{recipe, adopted_at, reason, validated_by}`).\n\n```bash\nuv run python scripts/demo_self_heal.py   # offline, fake healer: drift → heal → recovered\n```\n\nFirst demonstrated on the `toscana` scraper. This keeps a scraper alive across\nsmall portal changes without shipping new code — and refuses to guess when it\ncan't prove the fix. Where the open engine stops and managed/premium coverage\nbegins is mapped in the **[coverage map](docs/coverage-map.md)**.\n\n## Intelligence and benchmarks\n\nA **separate** track (not the matcher) ingests ANAC *historical* OCDS data —\nawarded public contracts — and computes compact benchmarks per **CPV-division ×\nregion**: award value distribution (median, p25/p75, min/max), volume,\nseasonality (by year), and distinct-supplier counts.\n\n```bash\nuv run bandiradar benchmarks build --sample          # offline, bundled real capture\nuv run bandiradar benchmarks show --cpv 45           # region falls back to national\n```\n\nReal output on the bundled sample:\n\n```text\nCPV division 45  [national]\n  awards (count): 22   distinct suppliers: 21\n  value EUR: median 470,768  p25 121,649  p75 1,594,879\n  range: 68,117 – 11,369,083\n  by year: 2022:22\n```\n\n**Honest data caveats:**\n- The dataset is **retrospective** — *awarded* contracts (\u003e €40k), not open calls.\n- It has awards + suppliers but **no tenderers list**, so we **cannot** derive a\n  \"number of bidders\". We derive value/volume/seasonality/supplier counts only.\n- The release addresses carry city + postal code but **no region/NUTS**, so\n  benchmarks are **national-only** for now (`region` stays `None`); the model and\n  aggregation already support regional buckets for when a region-bearing source\n  arrives.\n\n### Enrichment: benchmarks in the matcher\n\nThe benchmarks are **optional matcher enrichment** (injected like the score\ncache — the matcher works fine without them). Add `--with-benchmarks` to `match`\nand each scored opportunity gets, for its CPV division, a historical-context\n*reason* plus a *value-sanity* risk note when it declares an estimated value:\n\n```bash\nuv run bandiradar benchmarks build --sample\nuv run bandiradar match --profile mayai --source ted --sample --with-benchmarks\n```\n\n```text\n#1  score 44  [open]  Italia – Servizi di gestione dati – SERVIZIO DI GESTIONE ... ROCCA IMPERIALE (CS)\n     why: CPV prefix match (depth 2); capability overlap: dati; eu scope; ANAC history (CPV 72, national): 8 awards, median EUR 104,326, p25-p75 EUR 71,619-183,410\n     https://ted.europa.eu/en/notice/-/detail/376324-2026\n```\n\nValue-sanity triggers when the opportunity declares a value — e.g. a Lombardy\nmedical-devices tender (`--profile medtech_lombardia --source\nlombardia --sample --with-benchmarks`):\n\n```text\n#1  score 76  [open]  FORNITURA DI DISPOSITIVI PER ENDOSCOPIA DIGESTIVA … ASST LODI …\n    why: ...; ANAC history (CPV 33, national): 7 awards, median EUR 1,183,540, p25-p75 EUR 104,190-1,640,000\n    risk: estimated value EUR 2,133,178 is above the historical p75 EUR 1,640,000 for this category\n```\n\nEnrichment is append-only on a **copy** of the cached match: the cache always\nstores the bare match, so repeated runs never double-append. The\n`search_opportunities` MCP tool takes the same `with_benchmarks` flag. ANAC data\nlicensing is under [Data and licenses](#data-and-licenses).\n\n## Document enrichment (PDF/OCR)\n\nMost of a tender's real requirements live in attachment PDFs (the\n*disciplinare*/*bando*), not in the title or CPV. With `--with-documents`, the\nmatcher fetches an opportunity's `document_urls`, extracts the text, and folds it\ninto **every** matching input — the prefilter keyword gate, the offline\nheuristic's overlap, and the LLM prompt — so requirements that exist only in the\nattachments can still drive (or sink) a match. Extracted text is cached per URL\n(SQLite), so PDFs aren't re-downloaded.\n\n```bash\nuv run bandiradar match --profile mine.yaml --source ted --sample --with-documents\n```\n\n- **Optional and injected** — like the score/benchmark caches; off by default.\n  The default install only needs `pypdf`.\n- **OCR for scanned PDFs** is the optional `ocr` extra:\n  `uv sync --extra ocr` plus the system binaries `tesseract` and `poppler`. When\n  absent, OCR is skipped cleanly (text-based PDFs still work). Enrichment never\n  raises into the matcher — a failed fetch/parse degrades to no added text.\n- **Honest source coverage:** only `ted` currently carries a per-notice document\n  link (the notice PDF). `lombardia`, `incentivi`, and `anac` expose no per-document\n  attachment URL in their data, so `document_urls` is empty for them (no faking) —\n  until those links are wired, `--with-documents` is a no-op there.\n\n\u003e `--with-documents` fetches PDFs over the network, so (unlike the default\n\u003e `--sample` flow) it is **not** offline.\n\n## Watch and export\n\n`watch` is a monitor loop: it fetches, applies the storage change-detection, and\nreports **only** matches whose opportunity is **new or amended** since the last\nwatch run (a per-profile marker is persisted; `--since` overrides it).\n\n```bash\n# 1st run: all current matches are \"new\"; a 2nd run reports nothing new\nuv run bandiradar watch --profile mayai --source incentivi --sample\n# write a feed instead of printing\nuv run bandiradar watch --profile mayai --source incentivi --sample --rss ~/feed.xml\n```\n\n`export` is the full, non-delta dump of current matches (`--json` or `--rss PATH`).\n\n**Scheduling is your cron** — this is open-core (single-user/local). For example:\n\n```cron\n0 8 * * *  cd /path/to/bandiradar \u0026\u0026 uv run bandiradar watch --profile mine.yaml --rss ~/feed.xml\n```\n\nManaged delivery (WhatsApp/email/alerts), scheduling SaaS, and multi-tenant\nhosting live in `bandiradar-pro`.\n\n### Live monitor (runs itself, daily)\n\nThis repo monitors itself. A GitHub Actions workflow\n([`.github/workflows/monitor.yml`](.github/workflows/monitor.yml)) runs **every day\nat 05:23 UTC** (an off-peak minute — GitHub often skips on-the-hour schedules; and\non demand): it fetches every key-less source — plus `toscana`,\nso the [self-healing crawl](#self-healing-crawl) drift-check runs in production —\nwatches **every bundled profile**, and publishes the results to the orphan\n[**`monitor-data`** branch](../../tree/monitor-data):\n\n- `feeds/\u003cprofile\u003e.xml` / `feeds/\u003cprofile\u003e.json` — the new/amended matches per profile;\n- [`STATUS.md`](../../blob/monitor-data/STATUS.md) — run date, per-source outcome +\n  counts, new matches per profile, and the crawl-recipe state (`ok` / `drift` /\n  `healed` / `flagged`).\n\nIt runs **with zero secrets** (guardrail 1): keyless ⇒ recall mode + offline\nheuristic matcher, and crawl drift is only *detected*. Add the optional\n`ANTHROPIC_API_KEY` repo secret and the same workflow scores with the LLM **and**\nactivates the crawl **healer** (a drifted recipe is auto-re-derived and adopted only\nif it reproduces the golden exactly). The data branch is kept **flat** — one\nforce-pushed commit per run, so generated state never bloats the repo history. A run\nfails only when **every** source fails; partial failures are warnings in `STATUS.md`.\n\nIt fetches **once per run** (the first profile fetches every source; the others\nreuse the DB via `watch --skip-fetch`), so the whole job takes **~5 minutes**, not\n30+. Every request sends an identifying `User-Agent` and a short connect timeout.\n\n**Operational protections (v0.5.1).** With an LLM key, a per-run spend cap\n(`BANDIRADAR_LLM_BUDGET`) bounds new scorings — items beyond the cap are *deferred*\nand amortized by the score cache across the next runs, not dropped. Before each\npublish, `bandiradar prune` trims stale `raw_docs` of long-closed opportunities and\nold run rows (then `VACUUM`s) to keep the data branch well under GitHub's blob limit,\nwithout touching the score cache or crawl recipes. And the long step is time-boxed so\ndoctor + STATUS + publish always run: if the run is **truncated**, `STATUS.md` says so\n(`⚠️ Run truncated: X/N profiles completed`) instead of republishing stale numbers.\n\n\u003e **Known limit — PA hosts that geo-block datacenter IPs (and the optional relay).**\n\u003e Some open endpoints drop datacenter / extra-EU IPs at the connection level —\n\u003e `incentivi.gov.it` does, so GitHub-hosted runners couldn't reach it (verified:\n\u003e works from residential / EU IPs). Not fixable in client code. The engine supports\n\u003e an **optional HTTP relay** for exactly this: requests to allowlisted hosts are\n\u003e transparently rewritten to `\u003crelay\u003e?u=\u003coriginal-url\u003e` at the HTTP layer (no\n\u003e adapter changes), driven by three env vars / repo secrets — `BANDIRADAR_RELAY_URL`,\n\u003e `BANDIRADAR_RELAY_TOKEN` (sent as `X-Relay-Token`; from secrets, never the repo),\n\u003e `BANDIRADAR_RELAY_HOSTS` (comma-separated allowlist).\n\u003e\n\u003e **In this repo's live monitor the incentivi gap is solved**: the relay runs as a\n\u003e Vercel function pinned to an EU region (`fra1` → EU egress; reference source in\n\u003e [`infra/vercel-relay/`](infra/vercel-relay/) — the deployment and its token are\n\u003e the operator's infrastructure). An earlier Cloudflare-Worker attempt did NOT work:\n\u003e Workers execute on the edge nearest the CALLER, so a US runner got US egress and\n\u003e the geo-block stayed (HTTP 522). **With the env unset, nothing changes**: the repo\n\u003e stays keyless and fully functional, and the gap returns — visibly classified in\n\u003e `STATUS.md`, never silently. The pre-flight step probes incentivi both direct and\n\u003e via relay and logs both outcomes.\n\u003e *(TED's earlier 403 from CI was a different issue — a default-User-Agent block —\n\u003e and is **fixed**: with our identifying User-Agent, TED fetches from the runners.)*\n\n## AI agents (MCP)\n\nBandiRadar ships a thin [MCP](https://modelcontextprotocol.io) server (FastMCP),\nso you can drive it from Claude. Six tools:\n\n`list_sources` · `fetch_opportunities` · `search_opportunities` ·\n`score_opportunity` · `get_matches` · `get_profile`\n\n```bash\nuv run bandiradar mcp\n```\n\nRegistration and an offline example session are in [`docs/MCP.md`](docs/MCP.md).\n\n## Status\n\n- ✅ **Offline, zero-secret** — every demo above and the whole test suite run with\n  no network and no API key.\n- ✅ **9 live key-less sources** — `incentivi`, `ted`, `anac_pvl`, `lombardia`,\n  `lazio`, `sicilia`, `emilia_romagna`, `trentino` (open calls) plus `anac`\n  (historical). `--sample` keeps them offline against recorded real captures.\n- ✅ **Live OPEN Italian tenders** — `anac_pvl` (Pubblicità a Valore Legale) is the\n  national feed of open, biddable gare, incl. sub-threshold ones TED never lists, no\n  credentials; it keeps only still-open notices.\n- ✅ **5 LLM-assisted scrapers** — `toscana`, `veneto`, `piemonte`, `puglia`,\n  `sardegna`: live `fetch()` extracts fields from each portal's HTML bando pages\n  with an LLM (needs a key); `--sample` replays a recorded extraction offline.\n- ✅ **Self-healing crawl** — a drifted scraper listing triggers an LLM that\n  re-derives the crawl recipe (data, not code); it's adopted only when it exactly\n  reproduces the last-good results, else human-flagged.\n- ✅ **Stage-2 LLM scoring is wired and working** (optional); with no key it\n  transparently uses the offline heuristic — a proxy, not real semantic relevance.\n- ✅ **Live ANAC/PNCP fetch is wired** — streams the Open Contracting OCDS mirror\n  (key-less), capped at 500 releases/run. The data is **retrospective** (awarded\n  contracts), so it surfaces mostly-closed opportunities — useful for history /\n  market analysis, not as a feed of open calls.\n- ✅ **Live-fetch robustness shipped (0.2.0)** — retries/backoff, pagination,\n  per-source isolation (one source failing never aborts the others), per-record\n  quarantine, and a `doctor` diagnostic. Dirty single records are tolerated, never\n  fatal.\n- ⏳ **Honest limitation:** the real residual gap is **coverage**, not robustness —\n  Italian regional funding is fragmented across bespoke API-less portals, and the\n  richest tender documents are gated. See the\n  **[coverage map](docs/coverage-map.md)** for the open-vs-gated landscape and where\n  the open/Pro boundary falls.\n\n## Open core vs Pro\n\nAnything a single user can run locally is **open**. Anything *managed*,\n*multi-client*, or *a delivery channel* lives in the private `bandiradar-pro`,\nwhich depends on this package — never the reverse.\n\n| | `bandiradar` (this repo, MIT) | `bandiradar-pro` (private) |\n|---|---|---|\n| Engine (ingest/normalize/match) | ✅ | imports it |\n| Source framework (`Source` interface + registry) | ✅ | |\n| Reference adapters (ANAC, incentivi.gov.it) | ✅ | |\n| Two-stage matcher (incl. offline fallback) | ✅ | |\n| CLI + MCP server | ✅ | |\n| Dashboard (web UI) | | ✅ |\n| Premium / regional source adapters | | ✅ |\n| Delivery channels (WhatsApp, email, alerts) | | ✅ |\n| Multi-tenant, managed hosting, scheduling SaaS | | ✅ |\n\n## Roadmap\n\n**Shipped**\n- Canonical model + `Source` framework + two-stage matcher (deterministic\n  prefilter + LLM relevance with a zero-secrets offline fallback) + SQLite with\n  change-detection + CLI + MCP server.\n- Live sources: **TED** (EU open tenders), **incentivi.gov.it** (national\n  incentives), **`anac_pvl`** (national OPEN tenders — Pubblicità a Valore Legale,\n  incl. sub-threshold), and the regions **Lombardia** (Socrata tenders), **Lazio**\n  (LazioInnova incentives), **Sicilia** (EuroInfoSicilia FESR/FSC), **Emilia-Romagna**\n  (Plone `Bando`) and **Trentino** (CKAN FEASR), all key-less; **ANAC OCDS** wired as\n  a capped, key-less historical / awarded-contracts feed (analysis, not open calls).\n- **CPV resolver** (Italian CPV labels → official 8-digit EU codes) + region\n  fallback (province → comune/ISTAT → buyer → national) — measured keyless recall\n  gains on tender profiles.\n- **LLM-assisted scraper** for API-less regional portals — **Regione Toscana**\n  (Sviluppo Toscana) is the first instance (live fetch needs an LLM key).\n- **Self-healing crawl** — crawl recipes as data + drift detection + golden-sample\n  guard + an LLM recipe healer (gated adoption; human-flagged otherwise).\n- **[Coverage map](docs/coverage-map.md)** — honest open-vs-gated landscape of\n  Italian funding data.\n- **Intelligence track:** ANAC historical benchmarks + optional matcher\n  enrichment (`--with-benchmarks`).\n- **`watch` monitor loop** (new/amended deltas) + **JSON/RSS export**.\n- **Embeddings semantic prefilter** — built and **measured; ships optional and off**\n  (net-negative at the current recall ceiling — see *Honest limits* under\n  [Matching quality](#matching-quality-measured)).\n\n**Upcoming**\n- More community/regional source adapters (via the `Source` framework — 6 regions\n  covered so far; the per-territory recon in the\n  [coverage map](docs/coverage-map.md) shows where help is welcome).\n- `bandiradar-pro` (private): dashboard, WhatsApp/email delivery, scheduling\n  SaaS, multi-tenant hosting.\n\n## Contributing\n\nEvery source is `fetch` + a pure `to_opportunities`, plus a recorded fixture and\na test — adding one is a new file, no core changes. See\n[`CONTRIBUTING.md`](CONTRIBUTING.md) and the `add-a-source` skill\n(`skills/add-a-source/`) for the full copy-pasteable template; the playbook also\nlives in `CLAUDE.md` (\"How to add a new Source\").\n\nEach source also has an offline **contract test** against a recorded real response\n(`tests/cassettes/`), plus an **opt-in live drift check** that runs only with\n`uv run pytest -m live` (never in CI). See [`CONTRIBUTING.md`](CONTRIBUTING.md) for\nhow to run it and re-record a cassette when an API changes.\n\n## License\n\nMIT © MayAI — see [`LICENSE`](LICENSE).\n\n## Data and licenses\n\nBandiRadar consumes public open data; each source keeps its own licence, which\nits operator requires you to honor:\n\n- **TED — Tenders Electronic Daily (EU)** — the EU's public procurement journal\n  (Publications Office of the EU). Notice data is reusable under the Commission's\n  open-data reuse policy; the live `ted` fetch uses the anonymous public\n  `api.ted.europa.eu` search API (no key). Attribute TED / the Publications Office.\n- **incentivi.gov.it (IODL 2.0)** — published by the Ministero delle Imprese e del\n  Made in Italy under the\n  [Italian Open Data License v2.0](https://www.dati.gov.it/iodl/2.0/) (attribution\n  required). The live `incentivi` fetch hits the same open-data export endpoint the\n  portal's own \"Scarica dataset\" button uses (no separate static file; the download\n  is built client-side from that endpoint).\n- **Regione Lombardia (CC0 1.0)** — dataset `k6cb-4hbm` (*Bandi di gara —\n  Osservatorio Regionale*), via the dati.lombardia.it Socrata SODA API.\n- **Regione Lazio / LazioInnova** — bandi published by LazioInnova (the regional\n  development agency) and read via its WordPress REST API\n  (`lazioinnova.it/wp-json`). Source: LazioInnova / Regione Lazio; attribute the\n  source when reusing.\n- **Regione Toscana / Sviluppo Toscana** — bandi published on the Sviluppo Toscana\n  portal (`sviluppo.toscana.it`); detail-page links come from its WP REST listing\n  and the fields are LLM-extracted from each public bando page. Source: Sviluppo\n  Toscana / Regione Toscana; attribute the source when reusing.\n- **Regione Siciliana / EuroInfoSicilia** — FESR/FSC bandi published on\n  `euroinfosicilia.it` and read via its WordPress REST API (category \"Bandi e\n  Avvisi\"). Source: Regione Siciliana — EuroInfoSicilia; attribute the source.\n- **Regione del Veneto** — atti published on the public SIU bandi portal\n  (`bandi.regione.veneto.it`); the landing page seeds the crawl and the fields are\n  LLM-extracted from each public `Dettaglio` page. Source: Regione del Veneto;\n  attribute the source when reusing.\n- **Regione Piemonte** — bandi published on `bandi.regione.piemonte.it`; the\n  public Views listing seeds the crawl and the fields are LLM-extracted from each\n  public detail page. Source: Regione Piemonte; attribute the source when reusing.\n- **Regione Puglia** — avvisi published on the PR Puglia 2021-2027 portal\n  (`pr2127.regione.puglia.it`); the public news listing seeds the crawl and the\n  fields are LLM-extracted from each public detail page. Source: Regione Puglia;\n  attribute the source when reusing.\n- **Regione Autonoma della Sardegna / Sardegna Impresa** — agevolazioni published\n  on `sardegnaimpresa.eu`; the public listing seeds the crawl and the fields are\n  LLM-extracted from each public detail page. Source: Regione Autonoma della\n  Sardegna; attribute the source when reusing.\n- **Regione Emilia-Romagna** — bandi published on the regional Politiche\n  territoriali portal (`politicheterritoriali.regione.emilia-romagna.it`) and read\n  via plone.restapi (`portal_type=Bando`). Source: Regione Emilia-Romagna; attribute\n  the source.\n- **Provincia Autonoma di Trento (CC BY 4.0)** — FEASR bandi calendar from the\n  `dati.trentino.it` CKAN open-data portal. Source: Provincia Autonoma di Trento;\n  attribute the source.\n- **ANAC public contracts (CC BY 4.0)** — via the\n  [Open Contracting Data mirror](https://data.open-contracting.org/en/publication/117/);\n  both the `anac` source and the intelligence track stream the gzipped JSONL\n  memory-safely (line by line) through the shared reader in `bandiradar/ocp.py`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmayai-it%2Fbandiradar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmayai-it%2Fbandiradar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmayai-it%2Fbandiradar/lists"}