{"id":51305113,"url":"https://github.com/coko7/invoice2json","last_synced_at":"2026-06-30T23:03:17.190Z","repository":{"id":367914926,"uuid":"1282778160","full_name":"coko7/invoice2json","owner":"coko7","description":"🧾 Extract structured JSON from invoices and receipts using a local Ollama vision model — no cloud APIs required.","archived":false,"fork":false,"pushed_at":"2026-06-28T07:50:14.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-28T09:22:13.260Z","etag":null,"topics":["ai","document-extraction","fastapi","invoice","invoice-processing","json","llm","local-ai","ocr","ollama","pdf","receipt","receipt-parsing","structured-data","vibe-scaffolded","vision-model"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coko7.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-28T07:21:15.000Z","updated_at":"2026-06-28T07:50:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/coko7/invoice2json","commit_stats":null,"previous_names":["coko7/invoice2json"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/coko7/invoice2json","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coko7%2Finvoice2json","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coko7%2Finvoice2json/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coko7%2Finvoice2json/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coko7%2Finvoice2json/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coko7","download_url":"https://codeload.github.com/coko7/invoice2json/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coko7%2Finvoice2json/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34986248,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-30T02:00:05.919Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","document-extraction","fastapi","invoice","invoice-processing","json","llm","local-ai","ocr","ollama","pdf","receipt","receipt-parsing","structured-data","vibe-scaffolded","vision-model"],"created_at":"2026-06-30T23:03:16.503Z","updated_at":"2026-06-30T23:03:17.175Z","avatar_url":"https://github.com/coko7.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧾 invoice2json\n\nTurns a PDF invoice or photo of a receipt into structured JSON using a local\n[Ollama](https://ollama.com) vision model — no cloud APIs, no data leaving your machine.\n\n---\n\n## Quick start\n\n### 1. Prerequisites\n\n- Python 3.10+\n- [Ollama](https://ollama.com) running locally (`ollama serve`)\n- A vision-capable model pulled, e.g.:\n\n```bash\nollama pull llava          # default model used by the API\n# or\nollama pull llava:13b      # more accurate, needs ~10 GB VRAM\n# or\nollama pull moondream      # very fast, lower accuracy\n```\n\nFor PDF support you also need **poppler-utils** (only needed if you don't install `pypdfium2`):\n\n```bash\n# macOS\nbrew install poppler\n\n# Ubuntu / Debian\nsudo apt install poppler-utils\n```\n\n### 2. Install Python dependencies\n\n```bash\npip install -r requirements.txt\n\n# Optional: faster PDF rasterisation\npip install pypdfium2\n```\n\n### 3. Run the server\n\n```bash\nuvicorn main:app --reload --port 8000\n```\n\nInteractive docs: \u003chttp://localhost:8000/docs\u003e\n\n---\n\n## API reference\n\n### `POST /extract`\n\nUpload a file and get back structured JSON.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `file` | form-data file | — | PDF, PNG, JPG, or WEBP |\n| `model` | query string | `llava` | Ollama model name |\n\n**Example — curl:**\n\n```bash\n# Image\ncurl -s -X POST \"http://localhost:8000/extract\" \\\n     -F \"file=@receipt.jpg\" | jq .\n\n# PDF\ncurl -s -X POST \"http://localhost:8000/extract\" \\\n     -F \"file=@invoice.pdf\" | jq .\n\n# Use a different model\ncurl -s -X POST \"http://localhost:8000/extract?model=llava:13b\" \\\n     -F \"file=@invoice.pdf\" | jq .\n```\n\n**Example — Python:**\n\n```python\nimport httpx\n\nwith open(\"invoice.pdf\", \"rb\") as f:\n    resp = httpx.post(\n        \"http://localhost:8000/extract\",\n        files={\"file\": (\"invoice.pdf\", f, \"application/pdf\")},\n    )\nresp.raise_for_status()\ndata = resp.json()\nprint(data[\"total_amount\"])\nprint(data[\"line_items\"])\n```\n\n**Example response:**\n\n```json\n{\n  \"document_type\": \"invoice\",\n  \"vendor\": {\n    \"name\": \"Acme Corp\",\n    \"address\": \"123 Main St, Springfield, IL 62701\",\n    \"phone\": \"555-867-5309\",\n    \"email\": \"billing@acme.example\",\n    \"website\": null,\n    \"tax_id\": \"12-3456789\"\n  },\n  \"customer\": {\n    \"name\": \"Jane Smith\",\n    \"address\": \"456 Oak Ave, Portland, OR 97201\",\n    \"email\": null,\n    \"account_number\": \"CUST-00412\"\n  },\n  \"document_number\": \"INV-2024-0087\",\n  \"document_date\": \"2024-03-15\",\n  \"due_date\": \"2024-04-14\",\n  \"currency\": \"USD\",\n  \"line_items\": [\n    { \"description\": \"Widget Pro × 3\", \"quantity\": 3, \"unit_price\": 49.99, \"total\": 149.97 },\n    { \"description\": \"Shipping\", \"quantity\": 1, \"unit_price\": 9.95, \"total\": 9.95 }\n  ],\n  \"subtotal\": 159.92,\n  \"tax_rate\": 8.5,\n  \"tax_amount\": 13.59,\n  \"discount_amount\": null,\n  \"total_amount\": 173.51,\n  \"amount_paid\": 0,\n  \"amount_due\": 173.51,\n  \"payment_method\": null,\n  \"notes\": \"Net 30 payment terms\",\n  \"_source_file\": \"invoice.pdf\",\n  \"_model_used\": \"llava\",\n  \"_pages_processed\": 1\n}\n```\n\n### `GET /models`\n\nLists all models currently available in your local Ollama instance.\n\n```bash\ncurl http://localhost:8000/models | jq .\n```\n\n### `GET /health`\n\nLiveness check — returns `{\"status\": \"ok\"}`.\n\n---\n\n## Multi-page PDFs\n\nPDFs are rasterised page-by-page (up to 10 pages by default, configurable via\n`MAX_PDF_PAGES` in `main.py`). Line items are merged across all pages; scalar\nfields (totals, vendor name, etc.) use the first non-null value found.\n\n## Tips\n\n- **Accuracy**: `llava:13b` or `llava:34b` are noticeably more accurate than the\n  default 7B for dense invoices. `moondream` is fast but misses fields more often.\n- **Scanned / low-res images**: pre-process with an upscaler or ensure the source\n  is at least 150 DPI before uploading.\n- **Batch processing**: wrap the `/extract` endpoint in an async loop with\n  `asyncio.gather()` or use a task queue (Celery, ARQ) for large volumes.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoko7%2Finvoice2json","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoko7%2Finvoice2json","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoko7%2Finvoice2json/lists"}