https://github.com/alerque/acceptarium
Tools to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into plain text accounting systems.
https://github.com/alerque/acceptarium
beancount cli git-annex hledger ledger-cli llm ocr plaintext-accounting
Last synced: 8 days ago
JSON representation
Tools to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into plain text accounting systems.
- Host: GitHub
- URL: https://github.com/alerque/acceptarium
- Owner: alerque
- Created: 2026-03-13T22:21:54.000Z (3 months ago)
- Default Branch: master
- Last Pushed: 2026-04-25T11:31:48.000Z (about 2 months ago)
- Last Synced: 2026-04-29T22:45:50.194Z (about 1 month ago)
- Topics: beancount, cli, git-annex, hledger, ledger-cli, llm, ocr, plaintext-accounting
- Language: Rust
- Homepage: https://codeberg.org/plaintextaccounting/acceptarium
- Size: 765 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSES/0BSD.txt
Awesome Lists containing this project
README
# acceptarium
A collection of tooling to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into [Plain Text Accounting][pta] systems.
accipiō
: (*Classical Latin*) [akˈkɪ.pi.oː] to receive, accept
acceptarius
: (*Latin*) allotment-holding
: (*Medieval*) receipt book
----
# Overview
```mermaid
---
config:
layout: elk
look: handDrawn
theme: redux-dark
---
flowchart LR
A["Ingest/Scan"]
B["ID (Store)"]
C["Traditional OCR"]
D["Regex Extract"]
E["Rules"]
F["Review/Edit"]
G["Export"]
L1["LLM Vision"]
L2["LLM Extract"]
L3["Retrain"]
A --> B --> C & L1 --> D & L2 --> F --> G
F --> E & L3
E --> D
L3 --> L2
style L1 stroke-dasharray: 5
style L2 stroke-dasharray: 5
style L3 stroke-dasharray: 5
```
1. Scan or import scanned receipts, individually or in bulk.
1. Store identifiable scanned assets using [Git Annex][gitannex] or pluggable backends (LFS? WebDAV?).
1. **Optionally** extract data via OCR using local LLM tooling ([Ollama][ollama] or pluggable remote tooling).
1. **Optionally** automatically process data into structured transaction info (via local LLM tooling or pattern matching).
1. Facilitate either manual data entry or automatic data extraction with review and a chance to chance to edit.
1. **Optionally** use final data to update regex rules or train the LLM model to improve future extractions.
1. Export extracted data as transaction(s) via CVS? JSON? (or possibly directly to journal for [HLedger][hledger], [Ledger CLI][ledgercli], [Beancount][beancount], etc.).
## Goals
* Automate as many steps as possible to make it easy to handle receipts (and possibly invoices, etc.) in bulk.
* Disable all LLM related features by default and remain functional without them requiring explicit opt-in for use.
* Use only local-first privacy-preserving tooling by default — even where LLMs may be involved.
* Facilitate human review/approval and fully featured editing for any non-deterministic steps like LLM or OCR based meta-data extraction.
* Allow re-processing data from initial assets in the event of improved tooling (better OCR, more journal import rules, etc.).
## Non-goals
* Avoid lock-in to any particular PTA solution (pair with [HLedger][hledger], [Ledger CLI][ledgercli], [Beancount][beancount], or similar journal tools)
* Avoid dictating the entire accounting workflow; people have their own data handling already, we just want to mix in digitized assets.
[beancount]: https://beancount.io/
[gitannex]: https://git-annex.branchable.com/
[hledger]: https://hledger.org/
[ledgercli]: https://ledger-cli.org/
[ollama]: https://ollama.com/
[pta]: https://plaintextaccounting.org/