{"id":50168060,"url":"https://github.com/andreliar/pea-audit","last_synced_at":"2026-05-24T22:01:13.642Z","repository":{"id":360044523,"uuid":"1248477681","full_name":"AndreLiar/pea-audit","owner":"AndreLiar","description":"Audit French PEA (Plan d'Épargne en Actions) eligibility of ETFs by reading their KID with a vision LLM (Gemma 4 via Ollama Cloud). Pluggable LLM + KID source registry, versioned prompts, 13/13 eval baseline.","archived":false,"fork":false,"pushed_at":"2026-05-24T18:44:53.000Z","size":202,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-24T19:28:32.736Z","etag":null,"topics":["etf","fastapi","france","gemma","kid","llm","ollama","pea","personal-finance","priips","streamlit","vision-llm"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/pea-audit/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AndreLiar.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":"audit_cli.py","citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-24T17:36:54.000Z","updated_at":"2026-05-24T18:44:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/AndreLiar/pea-audit","commit_stats":null,"previous_names":["andreliar/pea-audit"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/AndreLiar/pea-audit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreLiar%2Fpea-audit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreLiar%2Fpea-audit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreLiar%2Fpea-audit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreLiar%2Fpea-audit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AndreLiar","download_url":"https://codeload.github.com/AndreLiar/pea-audit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreLiar%2Fpea-audit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33452033,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-24T19:21:36.376Z","status":"ssl_error","status_checked_at":"2026-05-24T19:21:10.562Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["etf","fastapi","france","gemma","kid","llm","ollama","pea","personal-finance","priips","streamlit","vision-llm"],"created_at":"2026-05-24T22:01:09.406Z","updated_at":"2026-05-24T22:01:13.635Z","avatar_url":"https://github.com/AndreLiar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pea-audit\n\n[![PyPI](https://img.shields.io/pypi/v/pea-audit.svg)](https://pypi.org/project/pea-audit/)\n[![Python](https://img.shields.io/pypi/pyversions/pea-audit.svg)](https://pypi.org/project/pea-audit/)\n[![CI](https://github.com/AndreLiar/pea-audit/actions/workflows/ci.yml/badge.svg)](https://github.com/AndreLiar/pea-audit/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n\nAudit French **PEA** (Plan d'Épargne en Actions) eligibility of ETFs by reading their **KID** (Key Information Document) with a vision LLM. Tells you whether a fund is actually eligible for a French PEA account — with verbatim citations from the document.\n\n\u003e **What is a PEA?** France's tax-sheltered stock account (€150k cap, gains tax-free after 5 years). It only accepts EU-domiciled equities, or UCITS funds that *synthetically* replicate non-EU indexes (S\u0026P 500, MSCI World, Nasdaq, …) via a swap on an EU-equity basket. Physical-replication funds of non-EU indexes — most iShares Core / Vanguard ETFs — don't qualify. **This library tells you which side of that line your fund is on.**\n\n\u003e **What's in this repo?** Two things: **`pea-audit`** — the library you `pip install` (lives in `pea_audit/`) — and **ETFTracker** — a reference app that consumes it (Streamlit dashboard + CLI + FastAPI at the repo root, plus `etftracker/` helper code). Most of this README is about the library; see [ETFTracker.md](ETFTracker.md) for the app side (French).\n\n**Releases:** see [CHANGELOG.md](CHANGELOG.md) — latest is v0.2.1, current major-feature drop is [v0.2.0](https://github.com/AndreLiar/pea-audit/releases/tag/v0.2.0) (async API, typed Enums, prompt-version cache key, SSRF guard, 51 unit tests).\n\n```\n$ python audit_cli.py samples/amundi_pea_monde_kid.pdf\n📄 Audit de : samples/amundi_pea_monde_kid.pdf\n\n  ✅ ÉLIGIBLE PEA    (confiance : high)\n\n  Émetteur     : Amundi\n  ISIN         : FR001400U5Q4\n  Indice       : MSCI World Index EUR\n  Réplication  : synthetic_swap\n\n  Le fonds est éligible au PEA car il utilise une réplication synthétique\n  via swap (IFT) avec un panier d'actions européennes ≥75%.\n\n  Preuves :\n    p.1 — « Le Fonds est éligible au Plan d'Épargne en Actions français (PEA) ... »\n    p.1 — « La performance sera échangée contre celle de l'Indice de Référence ... »\n```\n\n## Why\n\nPEA eligibility is opaque and *changes silently* — issuers re-domicile, swap counterparties, switch to ESG-screened variants, and rename funds (e.g. Amundi PEA Nasdaq-100 silently became \"Amundi PEA US Tech Screened\" under the same ticker). Brokers don't always flag this. `pea-audit` reads each fund's KID directly and tells you what the document actually says, with quotes you can verify.\n\n## Install\n\n```bash\npip install pea-audit\n```\n\nOptional extras:\n\n```bash\npip install 'pea-audit[observability]'  # adds Langfuse for LLM tracing\npip install 'pea-audit[evals]'           # adds pyyaml for the eval suite\npip install 'pea-audit[dev]'             # everything above + python-dotenv\n```\n\n## Quickstart\n\nGet an Ollama Cloud key at \u003chttps://ollama.com/settings/keys\u003e, then:\n\n```python\nfrom pathlib import Path\nfrom pea_audit import audit_pdf, VerdictCache\nfrom pea_audit.llm import OllamaCloudClient\n\n# Ollama Cloud keys look like \"\u003c32-hex-char id\u003e.\u003c24-char secret\u003e\"\n# (not \"sk-...\" — that's the OpenAI format)\nllm = OllamaCloudClient(api_key=\"abcdef0123456789abcdef0123456789.EXAMPLE-KEY-DO-NOT-USE\")\n\n# Cache is opt-in. Library never writes to disk unless you supply one.\ncache = VerdictCache(Path(\"./cache\"))\n\nverdict = audit_pdf(\"path/to/kid.pdf\", llm=llm, cache=cache)\n\nprint(verdict.eligible)        # Eligible.YES | NO | UNCERTAIN  (also == \"yes\" / \"no\" / \"uncertain\")\nprint(verdict.replication)     # Replication.PHYSICAL | SYNTHETIC_SWAP | UNKNOWN\nprint(verdict.isin)            # deterministic — extracted from PDF text + Luhn-validated\nfor c in verdict.evidence:\n    print(f\"  p.{c.page}: « {c.quote} »\")\n```\n\n**Don't have a KID PDF handy?** The repo ships `samples/amundi_pea_monde_kid.pdf` — clone or download it to try the example end-to-end on a real (PEA-eligible) Amundi fund.\n\n**More examples?** See [`examples/`](examples/) — 5 runnable demos (basic audit, audit-by-ticker, custom `VisionLLM`, custom `KIDSource`, async batch with `asyncio.gather`).\n\n### Audit by ticker (built-in URL registry)\n\n```python\nfrom pea_audit import audit_ticker, VerdictCache\nfrom pea_audit.llm import OllamaCloudClient\n\nllm = OllamaCloudClient(api_key=\"\u003cyour-ollama-cloud-key\u003e\")\ncache = VerdictCache(Path(\"./cache\"))\n\nresult = audit_ticker(\"EWLD.PA\", llm=llm, kid_dir=Path(\"./kids\"), cache=cache)\n\nfrom pea_audit import Eligible\nassert result.verdict.eligible is Eligible.YES  # also: == \"yes\" still works\n```\n\nBuilt-ins ship for the most common French ETFs (Amundi PEA range, BNP Paribas Easy). Add more:\n\n```python\nfrom pea_audit.sources import register_source, KIDSource\n\nregister_source(KIDSource(\n    ticker=\"LYX.PA\",\n    isin=\"FR0010411884\",\n    url=\"https://www.lyxoretf.fr/.../kid.pdf\",\n    issuer=\"Lyxor\",\n))\n```\n\n### Async usage\n\nUse `aaudit_pdf` + `AsyncOllamaCloudClient` from `asyncio` code — FastAPI handlers, webhook receivers, parallel batches:\n\n```python\nimport asyncio\nfrom pea_audit import aaudit_ticker, VerdictCache\nfrom pea_audit.llm import AsyncOllamaCloudClient\n\nasync def main():\n    llm = AsyncOllamaCloudClient(api_key=\"...\")\n    cache = VerdictCache(Path(\"./cache\"))\n    # 4 audits in parallel, ~one HTTP round-trip total instead of four\n    return await asyncio.gather(*[\n        aaudit_ticker(t, llm=llm, kid_dir=Path(\"./kids\"), cache=cache)\n        for t in [\"EWLD.PA\", \"PAEEM.PA\", \"ESE.PA\", \"PANX.PA\"]\n    ])\n\nasyncio.run(main())\n```\n\n`AsyncVisionLLM` is the protocol sibling of `VisionLLM` — bring your own async provider (Claude vision via `anthropic`, OpenAI, …).\n\n## Architecture\n\n```mermaid\nflowchart LR\n    A[KID PDF] --\u003e B[pypdfium2\u003cbr/\u003erasterize pages]\n    A --\u003e C[pypdfium2\u003cbr/\u003etext layer]\n    C --\u003e D[ISIN regex\u003cbr/\u003e+ Luhn check]\n    B --\u003e E[VisionLLM\u003cbr/\u003eanalyze_images]\n    D -.-\u003e E\n    E --\u003e F[PeaVerdict\u003cbr/\u003eeligible / replication\u003cbr/\u003eisin / evidence]\n    F --\u003e G[VerdictCache\u003cbr/\u003esha256-keyed]\n    G --\u003e H[Your app:\u003cbr/\u003eCLI / Streamlit / FastAPI / …]\n\n    style E fill:#dbeafe,stroke:#1e40af\n    style D fill:#dcfce7,stroke:#166534\n    style F fill:#fef3c7,stroke:#854d0e\n```\n\nThe LLM judges *what the document says*; deterministic regex + Luhn reconciles the ISIN string (vision is fuzzy on alphanumerics). The cache is opt-in — pass `cache=None` for a stateless library.\n\nTwo protocols make it extensible without forking:\n\n### `VisionLLM` — swap the model\n\n```python\nfrom typing import Any, Protocol\n\nclass VisionLLM(Protocol):\n    def analyze_images(\n        self,\n        images: list[bytes],\n        prompt: str,\n        schema: dict[str, Any],\n        system: str | None = None,\n    ) -\u003e dict[str, Any]: ...\n```\n\nThe default `OllamaCloudClient` wraps Gemma 4 via Ollama Cloud with `tenacity` retries on transient errors and optional Langfuse tracing. Anyone can implement this protocol to plug in Claude vision, GPT-4o, Gemini, a local Ollama instance, etc. An `AsyncVisionLLM` sibling exists for async backends.\n\n### `KIDSource` — add issuers\n\n```python\nfrom pea_audit.sources import register_source, KIDSource, get_source, all_sources\n```\n\nA registry of ticker → KID URL mappings. Ships builtins for Amundi (URL pattern), BNP Paribas (per-fund UUIDs); URL helpers for BlackRock/iShares + Vanguard are importable but don't auto-register (most of their funds are PEA-ineligible — they're for testing the negative path).\n\n## Eval baseline\n\nThe repo ships **13 regression cases** under `evals/cases/*.yaml` — 7 PEA-eligible synthetic-swap, 6 ineligible physical non-EEA — covering Amundi, BNP, BlackRock/iShares, Vanguard. Current baseline on Gemma 4 31b-cloud: **13/13 (100%)**. Run before any prompt or model change:\n\n```bash\npython evals/run.py                  # compares against evals/baseline.json\npython evals/run.py --save-baseline  # snapshot the current pass-set\n```\n\nExit code `2` on regression (a previously-passing case now fails) — wire into CI to gate prompt/model changes.\n\n## What does it cost?\n\nDefault backend is Gemma 4 31b-cloud via Ollama Cloud:\n\n| Operation | Approx. cost | Notes |\n|---|---|---|\n| One audit (cold cache) | ~$0.02 | 1 PDF, ~3 pages, vision model |\n| One audit (cache hit) | $0 | sha256 lookup, no LLM call |\n| Full eval suite (13 cases, cold) | ~$0.25 | Once per prompt/model change |\n| Monthly portfolio re-audit (4 funds, force-refresh) | ~$0.10 | One scheduled run per month |\n\nBring your own LLM via `VisionLLM` and the cost equation becomes *your provider's per-image price × ~3 pages per KID*. The library doesn't add overhead beyond one model call per audit.\n\n## Production niceties\n\n- **Retries on transient errors** — `tenacity` with exponential backoff (1s → 4s → 16s), only on network/timeout/5xx (not on 4xx or schema errors that won't self-resolve)\n- **SSRF guard on downloads** — `_download_kid` rejects non-`http(s)` schemes, enforces a 20 MB streaming cap, verifies `Content-Type` looks like PDF before writing to disk (matters because `KIDSource.url` is user-registrable)\n- **Optional observability** — Langfuse traces per LLM call (model, input/output, tokens, latency). Activates when `LANGFUSE_PUBLIC_KEY`/`LANGFUSE_SECRET_KEY` are set, silent no-op otherwise\n- **Deterministic ISINs** — vision misreads of the 12-char ISIN string are corrected by regex-extracting candidates from the PDF text layer and validating with the Luhn check digit\n- **Versioned prompts** — `pea_audit/prompts/audit_v{N}.md` files, selected via `prompt_version=` parameter; rollback is a config change, not a code edit. The version is part of the cache key, so upgrading a prompt automatically invalidates stale verdicts\n- **Hard vs soft fields in diffs** — `compare_verdicts()` defaults to comparing only categorical fields (`eligible`, `replication`, `isin`) so monthly re-audit doesn't false-fire on LLM rephrasing of free-text issuer/index names\n\n## Known limitations\n\n- **LLM variance** — across repeated runs with the same prompt + model + PDFs, the eval pass rate oscillates between 11/13 and 13/13. The two flaky cases (iShares Core S\u0026P 500 + iShares Nasdaq 100) sometimes return `replication: unknown` instead of `physical`; the LLM's `summary_fr` still reasons correctly, only the structured field wavers. **Eligibility verdict is stable** in all observed runs. Candidate v0.3 fix: multi-sample voting (run N=3, take majority).\n- **Vision-only ISIN reads can drift** — mitigated by the Luhn-validated text-layer extractor, but scanned PDFs without a text layer fall back to the LLM's vision read and can be wrong on alphanumeric ISINs (e.g. `FR001400U5Q4` → `FR00140056U4`). The verdict \u0026 replication fields are reliable; the ISIN field on scanned PDFs is best-effort.\n- **The audit verdict is LLM-judged**, not regulatory advice. The library cites the actual KID text so you can verify — always do, especially before buying.\n\n## Reference app: ETFTracker\n\nThe repo also ships a personal-tool app that consumes the library: a French ETF portfolio tracker with a Streamlit dashboard, monthly re-audit cron, FastAPI service, and Docker compose deployment. See [ETFTracker.md](ETFTracker.md) (French) for that side.\n\nTo run it:\n\n```bash\ncp positions.csv.example positions.csv   # edit with your own holdings\ncp .env.example .env                     # add your OLLAMA_API_KEY\ndocker compose up -d web                 # → http://localhost:8502\n# or:  streamlit run dashboard.py\n```\n\n![Dashboard](https://raw.githubusercontent.com/AndreLiar/pea-audit/main/docs/screenshot-dashboard.png)\n\n\u003e *Streamlit \"Portefeuille\" tab — 4 holdings with live yfinance prices and PEA-eligibility badges (✅ from the audit cache).*\n\n### Not a developer?\n\nThree options if you don't write Python but want to check your PEA:\n\n1. **Run the dashboard locally** with the 3 commands above. No code edits required after `positions.csv` is filled.\n2. **Upload via the HTTP API** — `docker compose up -d api` then `POST /audit/upload` with a PDF (Swagger docs at http://localhost:8080/docs).\n3. **Hire a developer** — realistically the best option for non-technical PEA holders. The library exists so a hosted version of this is buildable in a weekend.\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md). Maintainer? See [PUBLISHING.md](PUBLISHING.md). Release history: [CHANGELOG.md](CHANGELOG.md).\n\n## License\n\n[MIT](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreliar%2Fpea-audit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandreliar%2Fpea-audit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreliar%2Fpea-audit/lists"}