An open API service indexing awesome lists of open source software.

https://github.com/alerque/acceptarium

Tools to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into plain text accounting systems.
https://github.com/alerque/acceptarium

beancount cli git-annex hledger ledger-cli llm ocr plaintext-accounting

Last synced: 8 days ago
JSON representation

Tools to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into plain text accounting systems.

Awesome Lists containing this project

README

          

# acceptarium

A collection of tooling to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into [Plain Text Accounting][pta] systems.

accipiō
: (*Classical Latin*) [akˈkɪ.pi.oː] to receive, accept

acceptarius
: (*Latin*) allotment-holding
: (*Medieval*) receipt book

----

# Overview

```mermaid
---
config:
layout: elk
look: handDrawn
theme: redux-dark
---
flowchart LR
A["Ingest/Scan"]
B["ID (Store)"]
C["Traditional OCR"]
D["Regex Extract"]
E["Rules"]
F["Review/Edit"]
G["Export"]
L1["LLM Vision"]
L2["LLM Extract"]
L3["Retrain"]
A --> B --> C & L1 --> D & L2 --> F --> G
F --> E & L3
E --> D
L3 --> L2
style L1 stroke-dasharray: 5
style L2 stroke-dasharray: 5
style L3 stroke-dasharray: 5
```

1. Scan or import scanned receipts, individually or in bulk.
1. Store identifiable scanned assets using [Git Annex][gitannex] or pluggable backends (LFS? WebDAV?).
1. **Optionally** extract data via OCR using local LLM tooling ([Ollama][ollama] or pluggable remote tooling).
1. **Optionally** automatically process data into structured transaction info (via local LLM tooling or pattern matching).
1. Facilitate either manual data entry or automatic data extraction with review and a chance to chance to edit.
1. **Optionally** use final data to update regex rules or train the LLM model to improve future extractions.
1. Export extracted data as transaction(s) via CVS? JSON? (or possibly directly to journal for [HLedger][hledger], [Ledger CLI][ledgercli], [Beancount][beancount], etc.).

## Goals

* Automate as many steps as possible to make it easy to handle receipts (and possibly invoices, etc.) in bulk.
* Disable all LLM related features by default and remain functional without them requiring explicit opt-in for use.
* Use only local-first privacy-preserving tooling by default — even where LLMs may be involved.
* Facilitate human review/approval and fully featured editing for any non-deterministic steps like LLM or OCR based meta-data extraction.
* Allow re-processing data from initial assets in the event of improved tooling (better OCR, more journal import rules, etc.).

## Non-goals

* Avoid lock-in to any particular PTA solution (pair with [HLedger][hledger], [Ledger CLI][ledgercli], [Beancount][beancount], or similar journal tools)
* Avoid dictating the entire accounting workflow; people have their own data handling already, we just want to mix in digitized assets.

[beancount]: https://beancount.io/
[gitannex]: https://git-annex.branchable.com/
[hledger]: https://hledger.org/
[ledgercli]: https://ledger-cli.org/
[ollama]: https://ollama.com/
[pta]: https://plaintextaccounting.org/