{"id":48218994,"url":"https://github.com/notoriouslab/gmail-statement-fetcher","last_synced_at":"2026-04-04T19:03:30.398Z","repository":{"id":342993095,"uuid":"1175857537","full_name":"notoriouslab/gmail-statement-fetcher","owner":"notoriouslab","description":"Downloads bank statement PDFs from Gmail. Config-driven — add any bank via JSON, no code changes. IMAP or OAuth 2.0.自動從 Gmail 下載銀行對帳單 PDF。規則由 JSON 設定檔驅動，新增銀行不需改程式碼。支援 IMAP 與 OAuth 2.0，內建去重機制。","archived":false,"fork":false,"pushed_at":"2026-04-02T05:36:42.000Z","size":111,"stargazers_count":15,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-02T18:58:04.650Z","etag":null,"topics":["automation","bank-statement","gmail","imap","oauth","pdf","python","taiwan"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/notoriouslab.png","metadata":{"files":{"readme":"README.en.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-08T09:18:22.000Z","updated_at":"2026-04-02T05:36:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/notoriouslab/gmail-statement-fetcher","commit_stats":null,"previous_names":["notoriouslab/gmail-statement-fetcher"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/notoriouslab/gmail-statement-fetcher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fgmail-statement-fetcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fgmail-statement-fetcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fgmail-statement-fetcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fgmail-statement-fetcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/notoriouslab","download_url":"https://codeload.github.com/notoriouslab/gmail-statement-fetcher/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fgmail-statement-fetcher/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31409471,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","bank-statement","gmail","imap","oauth","pdf","python","taiwan"],"created_at":"2026-04-04T19:03:30.257Z","updated_at":"2026-04-04T19:03:30.381Z","avatar_url":"https://github.com/notoriouslab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gmail-statement-fetcher\n\nAutomatically download bank/financial statement PDFs from Gmail — config-driven, deduplication built-in, dual IMAP/OAuth support.\n\n**Requires Python 3.9+** · Part of the [notoriouslab](https://github.com/notoriouslab) open-source toolkit.\n\n\u003e [繁體中文 README](README.md)\n\n---\n\n## Why This Tool\n\nMost Gmail-based statement tools are one-off scripts tied to a single bank.\nThis one is **config-driven**: add any bank without touching code. Any AI agent framework can call it via shell — a `SKILL.md` is included for direct [OpenClaw](https://openclaw.ai/) integration.\n\n| Feature | Description |\n|---------|-------------|\n| Multi-bank | JSON config, no code changes to add a bank |\n| Deduplication | UID-based, never re-downloads the same email |\n| IMAP mode | stdlib only, zero install, headless-friendly |\n| OAuth 2.0 | `gmail.readonly` scope |\n| ZIP extraction | stdlib, ZIP bomb–protected (100 MB cap) |\n| PDF decryption | optional pikepdf; passwords in `.env`, not config |\n| Normalized filenames | `永豐銀行_信用卡對帳單_2026_02.pdf` |\n| Dry-run preview | `--dry-run` shows matches without downloading |\n| Atomic writes | `tempfile` + `os.replace()` — no partial files |\n| Privacy-safe dedup | stores subject SHA-256 hashes, not raw subjects |\n| Security hardened | token 0o600, log masking, log injection stripped |\n\n---\n\n## Quick Start\n\n```bash\n# 1. Clone\ngit clone https://github.com/notoriouslab/gmail-statement-fetcher.git\ncd gmail-statement-fetcher\n\n# 2. Copy and edit config\ncp config.example.json config.json\n# Edit config.json — add your bank's sender domain and subject keywords\n\n# 3. Set credentials\ncp .env.example .env\n# Edit .env — fill in GMAIL_USER and GMAIL_APP_PASSWORD\n\n# 4a. IMAP mode — no extra install needed\npip install python-dotenv   # optional but recommended\npython3 fetcher.py\n\n# 4b. OAuth mode — install dependencies first\npip install google-auth-oauthlib google-api-python-client python-dotenv\n# Set AUTH_METHOD=oauth in .env, place credentials.json in project root\npython3 fetcher.py\n\n# Output: ./downloads/永豐銀行_銀行對帳單_2026_02.pdf\n```\n\nPreview matched emails without downloading:\n\n```bash\npython3 fetcher.py --dry-run --verbose\n```\n\n---\n\n## Authentication\n\n### IMAP + App Password — recommended for servers\n\nHeadless-friendly, no browser needed, stdlib only.\n\n1. Enable 2FA on your Google account\n2. Go to **Security → App Passwords**, create one for \"Mail\"\n3. Set in `.env`: `AUTH_METHOD=imap`, `GMAIL_USER`, `GMAIL_APP_PASSWORD`\n\n### OAuth 2.0 — recommended for personal use\n\nUses `gmail.readonly` scope — more secure, but requires one-time browser authorization.\n\n1. Create a project in [Google Cloud Console](https://console.cloud.google.com/)\n2. Enable the Gmail API\n3. Create OAuth credentials (Desktop app) → download `credentials.json` → **place in project root**\n4. Install: `pip install google-auth-oauthlib google-api-python-client`\n5. Set `AUTH_METHOD=oauth` in `.env`\n6. First run opens a browser for authorization → generates `token.json`\n\n\u003e **Headless servers**: After the first OAuth run on a local machine, copy `token.json` to your server and set `OAUTH_TOKEN=/path/to/token.json`. Keep this file backed up — losing it requires re-authorization.\n\n---\n\n## Configuration\n\n### Config file format\n\n```jsonc\n{\n  \"banks\": {\n    \"my_bank\": {\n      \"name\": \"My Bank\",                         // display name\n      \"short_name\": \"MyBank\",                    // used in filename prefix\n      \"imap_search\": {\n        \"sender_keywords\": [\"mybank.com\"],       // match From header (domain boundary)\n        \"subject_keywords\": [\"e-Statement\"],     // AND logic with sender\n        \"exclude_attachment_patterns\": [\"terms\"] // skip attachments matching these\n      },\n      \"doc_type_rules\": [                        // first match wins\n        {\"keyword\": \"credit card\", \"type\": \"CreditCard\"},\n        {\"keyword\": \"e-Statement\", \"type\": \"BankStatement\"}\n      ],\n      \"default_doc_type\": \"Statement\",           // fallback doc type\n      \"subject_date_pattern\": \"(\\\\d{4})[-/](\\\\d{2})\", // regex for YYYY/MM from subject\n      \"pdf_password\": \"\",   // leave empty — use .env instead (see below)\n      \"zip_password\": \"\"    // leave empty — use .env instead (see below)\n    }\n  },\n  \"global_settings\": {\n    \"lookback_days\": 60,    // scan window in days\n    \"retention_days\": 180   // dedup record lifetime in days\n  }\n}\n```\n\n\u003e Keys starting with `_` (e.g. `_example_en`) are ignored — use for disabled or template entries.\n\u003e See `config.example.json` for ready-to-use Taiwan bank configs.\n\n**Filename format**: `{short_name}_{doc_type}_{YYYY}_{MM}.pdf`\n\nMonth is always zero-padded (`_02_` not `_2_`). `subject_date_pattern` captures raw digits; the fetcher normalises them automatically.\n\n### Secret management\n\n**All passwords belong in `.env`, not `config.json`.**\n\n```\n# .env example\nGMAIL_USER=you@gmail.com\nGMAIL_APP_PASSWORD=xxxx-xxxx-xxxx-xxxx\n\n# Per-bank passwords: {BANK_KEY_UPPERCASED}_{PDF_PASSWORD|ZIP_PASSWORD}\nSINOPAC_PDF_PASSWORD=your-sinopac-pdf-password\nCTBC_ZIP_PASSWORD=your-ctbc-zip-password\n```\n\nEnv vars take precedence over `config.json`. Keeping `config.json` password-free means it is safe to share or version-control. The fetcher warns at startup if it detects passwords in `config.json`.\n\nBoth `config.json` and `.env` are excluded by `.gitignore`. Only `config.example.json` (no real passwords) should be committed.\n\n---\n\n## ZIP \u0026 PDF Password Support\n\nSome banks deliver statements as password-protected ZIPs or PDFs.\n\n**ZIP** (stdlib, no extra install):\n```bash\n# Set in .env\nCTBC_ZIP_PASSWORD=your-zip-password\n```\n\n**PDF decryption** (requires pikepdf):\n```bash\npip install pikepdf~=9.0\n```\n```bash\n# Set in .env\nSINOPAC_PDF_PASSWORD=your-pdf-password\n```\n\nFormat: `{BANK_KEY_UPPERCASED}_{PDF_PASSWORD|ZIP_PASSWORD}` — takes precedence over `config.json`.\n\nIf `pikepdf` is not installed and a PDF password is set, the encrypted PDF is saved as-is with a warning.\n\n---\n\n## CLI Options\n\n```\npython fetcher.py [options]\n\n  --config      path to config JSON                    (default: \u003cscript dir\u003e/config.json)\n  --output-dir  directory to save PDFs                 (default: \u003cscript dir\u003e/downloads)\n  --state-file  path to UID dedup store JSON           (default: \u003coutput-dir\u003e/.processed_uids.json)\n  --auth        imap | oauth                           (overrides AUTH_METHOD env var)\n  --dry-run     preview matched emails without downloading\n  --verbose     enable debug logging\n  --version     print version and exit\n```\n\n### Exit codes\n\n| Code | Meaning |\n|------|---------|\n| 0 | Completed successfully |\n| 1 | Runtime error (IMAP/OAuth failure, config missing) |\n\n---\n\n## Cron / Scheduling\n\n**Recommended: install python-dotenv**\n\n```bash\npip install python-dotenv\n```\n\nThe fetcher calls `load_dotenv()` automatically — no manual `export` needed in cron.\n\n```bash\n# Run daily at 09:00\n0 9 * * * cd /path/to/gmail-statement-fetcher \u0026\u0026 python3 fetcher.py\n```\n\n**Without python-dotenv** (`export $(cat .env | xargs)` breaks on passwords with `$`, spaces, or `#`):\n\n```bash\n#!/bin/bash\n# run_fetcher.sh\nset -a\nsource \"$(dirname \"$0\")/.env\"\nset +a\nexec python3 \"$(dirname \"$0\")/fetcher.py\" \"$@\"\n```\n\nFor OAuth on headless servers, set `OAUTH_TOKEN` to the full path of `token.json`.\n\n---\n\n## Security\n\n- **Atomic writes**: all PDF saves use `tempfile.mkstemp` + `os.replace()` — no partial files\n- **Privacy-safe dedup**: `.processed_uids.json` stores SHA-256 subject hashes, not raw subjects\n- **Secret isolation**: passwords in `.env` only; startup warns if `config.json` contains secrets\n- **Token permissions**: `token.json` saved at `0o600`\n- **Username masking**: Gmail address logged as first 3 chars + `***`\n- **Domain boundary matching**: sender matching uses `@`/`.` prefix to reduce false positives\n- **ZIP bomb protection**: decompression capped at 100 MB (streaming guard, ignores header file_size)\n- **Log injection prevention**: email subjects sanitised before logging\n\nSee [SECURITY.md](SECURITY.md) for the full security policy.\n\n---\n\n## AI Agent Integration\n\nStandard CLI tool — any AI agent framework can invoke it via shell. `SKILL.md` is included for [OpenClaw](https://openclaw.ai/) integration.\n\n```bash\n# Dry-run first to confirm matches, then download\npython3 fetcher.py --dry-run --verbose\npython3 fetcher.py --output-dir ./downloads\n```\n\n---\n\n## Part of the notoriouslab Pipeline\n\n```\ngmail-statement-fetcher   →  download PDF statements from Gmail\n        ↓\n   doc-cleaner             →  PDF/DOCX/XLSX → structured Markdown\n        ↓\n   personal-cfo            →  monthly audit + retirement glide path\n```\n\nEach tool works standalone. Together they form a full personal finance automation pipeline.\n\n---\n\n## Contributing\n\nThe easiest contribution is adding a bank config entry — no code changes needed:\n\n1. Fork and create a branch: `git checkout -b add-\u003cbank-name\u003e`\n2. Add an entry to `config.example.json`\n3. Test with `python fetcher.py --dry-run`\n4. Open a PR: `config: add \u003cBank Name\u003e`\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md).\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotoriouslab%2Fgmail-statement-fetcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnotoriouslab%2Fgmail-statement-fetcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotoriouslab%2Fgmail-statement-fetcher/lists"}