https://github.com/psynetic-software/camt-parser
CAMT XML Parser for SEPA banking data (C++).
https://github.com/psynetic-software/camt-parser
banking camt camt052 camt053 camt054 cpp csv finance mit parser qif sepa swift xml
Last synced: 5 months ago
JSON representation
CAMT XML Parser for SEPA banking data (C++).
- Host: GitHub
- URL: https://github.com/psynetic-software/camt-parser
- Owner: psynetic-software
- License: mit
- Created: 2025-11-06T15:08:24.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-11-06T18:18:08.000Z (5 months ago)
- Last Synced: 2025-11-06T18:19:47.467Z (5 months ago)
- Topics: banking, camt, camt052, camt053, camt054, cpp, csv, finance, mit, parser, qif, sepa, swift, xml
- Language: C++
- Homepage: https://www.taxpool.net/
- Size: 34.2 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# camt-parser
A high-performance C++ library for parsing SEPA CAMT XML banking formats.
Supports:
- **camt.052** Bank-to-Customer Account Report
- **camt.053** Bank-to-Customer Statement
- **camt.054** Bank-to-Customer Debit/Credit Notification
The library provides a structured, domain-friendly data model and an optional
CSV export layer optimized for accounting, reconciliation, and audit workloads.
## Features
- Fully supports **camt.052 / .053 / .054**
- Extracts complete transaction details (counterparty, remittance, references)
- Unicode-aware free-text normalization (optional `USE_UTF8PROC`)
- Bank transaction code → **GVC** mapping included
- Optional **canonical transaction hash** for duplicate detection
- CSV export designed for **accounting systems**
- Zero locale-dependencies (monetary values kept as text until formatted)
- **MIT License** (commercial-friendly)
## Installation / Build
```bash
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
```
Optional dependencies:
- **pugixml** (MIT) for XML parsing
- **utf8proc** (MIT) for Unicode normalization (if `USE_UTF8PROC` is defined)
## High-Level Data Model
```
Document
└─ Statement(s)
└─ Entry(s)
└─ EntryTransaction(s)
```
### `Document`
Represents a full CAMT file (052 / 053 / 054).
### `Statement`
Contains metadata (account, creation time, balances) and `Entry` records.
### `Entry`
Represents a booked transaction line (booking date, value date, amount).
May contain **multiple** `EntryTransaction` elements.
### `EntryTransaction`
Contains full payment details:
- Counterparty (IBAN/BIC/Name)
- Remittance text (structured + unstructured)
- ISO 20022 BankTransactionCode (Domain / Family / SubFamily)
- Proprietary bank code
- Charges, fees, reversal indicator, etc.
## Parsing API
```cpp
#include
camt::Parser p;
camt::Document doc;
std::string err;
if (!p.parse_file("statement.xml", doc, &err)) {
std::cerr << "Parse error: " << err << "\n";
}
```
## CSV Export
```cpp
#include
std::ofstream out("export.csv");
camt::ExportOptions opt;
opt.write_utf8_bom = true; // Excel compatible
opt.remittance_separator = " | "; // readable multi-part purpose text
opt.signed_amount = true;
opt.credit_as_bool = true;
camt::export_entries_csv(doc, &out, nullptr, opt);
```
## Quick Start Example
This example demonstrates parsing a CAMT file and exporting transactions to CSV:
~~~cpp
#include
#include
#include
#include
int main() {
camt::Parser parser;
camt::Document doc;
std::string err;
// Parse CAMT XML from file
if (!parser.parse_file("statement.camt053.xml", doc, &err)) {
std::cerr << "Parse error: " << err << "\n";
return 1;
}
// Export to CSV
camt::ExportData data;
camt::ExportOptions opt;
opt.include_header = true;
std::ofstream out("export.csv", std::ios::binary);
camt::export_entries_csv(doc, &out, &data, opt);
std::cout << "Export completed: export.csv\n";
return 0;
}
~~~
### Field Semantics: `first` vs `second`
Each exported field consists of a pair:
- **first** → Human-readable formatted value
- **second** → Canonical normalized value used for sorting, hashing, and deduplication
`second` is **never empty**: if not present, it is generated via normalization rules.
### Supported Input Methods
| Input type | Function |
|-----------|----------|
| File path | `parse_file(const std::string& path, ...)` |
| Input stream | `parse_file(std::istream&, ...)` |
| Memory buffer | `parse_string(const char* buffer, ...)` |
### Complete Demonstration
A fully working test and demonstration is available here: examples/main.cpp
## ExportOptions
| Option | Default | Description |
|--------|---------|-------------|
| `delimiter` | `';'` | CSV separator (`;`, `,`, or `\t`) |
| `include_header` | `true` | Adds header row |
| `write_utf8_bom` | `false` | Necessary for Excel UTF-8 import |
| `signed_amount` | `true` | Credit positive / Debit negative |
| `credit_as_bool` | `true` | `IsCredit = 1/0` instead of `CRDT/DBIT` |
| `remittance_separator` | `""` | Join multiple `Ustrd[]` lines |
| `use_effective_credit` | `false` | Apply reversal indicator |
| `prefer_ultimate_counterparty` | `true` | Prefer `UltmtDbtr` / `UltmtCdtr` |
## Exported Row Format: Display Value (`first`) vs Normalized Value (`second`)
Each exported field consists of a pair:
- **first**: Human-readable display value (formatted, signed, date-formatted)
- **second**: Canonical normalized value used for sorting, comparison, and deterministic hashing.
`second` is **never empty** — if not assigned directly, it is generated via normalization rules.
### Field overview (C1 display naming)
| Field | `first` (display) | `second` (normalized raw) | Influenced by options |
|---|---|---|---|
| Value Date | `YYYY-MM-DD` | `YYYYMMDD` digits | Sorting (`useBookingDate=false`) |
| Booking Date | `YYYY-MM-DD` | `YYYYMMDD` digits | Sorting (`useBookingDate=true`) |
| Amount | Formatted signed or unsigned | Absolute numeric text | `signed_amount`, `use_effective_credit` |
| Is Credit | `1`/`0` or `CRDT`/`DBIT` | `1`/`0` original direction | `credit_as_bool`, `use_effective_credit` |
| Reversal | `1` or `0` | `1` or `0` | Affects sign if `use_effective_credit=true` |
| Currency | `EUR` etc. | Uppercased, trimmed | Normalization only |
| Counterparty Name | Human-friendly chosen party | NFC/casefold/trim normalized | `prefer_ultimate_counterparty`, normalization |
| Counterparty IBAN | Formatted IBAN | Uppercase, no spaces | Normalization |
| Counterparty BIC | Formatted BIC | Uppercase, no spaces | Normalization |
| Remittance Line | Joined text lines | GS-joined normalized tokens | `remittance_separator`, `USE_UTF8PROC` |
| Remittance Structured | Display text | Normalized base | Normalization |
| End-to-End ID | Shown as provided | Uppercase, no spaces | Normalization |
| Mandate ID | Shown as provided | Uppercase, no spaces | Normalization |
| Transaction ID | Shown as provided | Uppercase, no spaces | Normalization |
| Bank Reference | Display reference | Normalized | Normalization |
| Account IBAN | Statement IBAN | Uppercase, no spaces | Normalization |
| Account BIC | Statement BIC | Uppercase, no spaces | Normalization |
| Booking Code | Code as text | Uppercase trimmed | Normalization |
| Status | Display text | Trimmed | Normalization |
| Running Balance | Formatted running total | Same as display | Always signed logically (`CRDT = +`, `DBIT = −`) |
| Charges Amount | Display charges | Same as display | Independent of `signed_amount` |
| Charges Currency | Display currency | Normalized | Normalization |
| Charges Included | `1`/`0` | Same | None |
| Entry Ordinal | Display index | Same | Used as stable tiebreaker |
| Transaction Ordinal | Display index | Same | Used as stable tiebreaker |
This section ensures consistent interpretation when converting to CSV, databases, or accounting systems.
(`first`) vs Normalized Value (`second`)
Each exported field consists of a pair:
- **first**: Human-readable display value (formatted, signed, date-formatted)
- **second**: Canonical normalized value used for sorting, comparison, and deterministic hashing.
`second` is **never empty** — if not assigned directly, it is generated via normalization.
| Display Name | `first` (human output) | `second` (normalized / for sorting) | Relevant Options |
|---|---|---|---|
| **Value Date** | `YYYY-MM-DD` | `YYYYMMDD` | Affects sorting order |
| **Booking Date** | `YYYY-MM-DD` | `YYYYMMDD` | Sorting if selected |
| **Amount** | Signed or unsigned amount text | Absolute numeric amount | `signed_amount`, `use_effective_credit` |
| **Is Credit** | `1` (credit) / `0` (debit) or `CRDT/DBIT` | Always `1/0` for original direction | `credit_as_bool`, `use_effective_credit` |
| **Reversal** | `1` or `0` | Same | May flip meaning under `use_effective_credit` |
| **Counterparty Name** | Best resolved party name | Case-normalized canonical text | `prefer_ultimate_counterparty` |
| **Counterparty IBAN** | Standard IBAN text | Uppercased, spaces removed | Normalization |
| **Counterparty BIC** | Standard BIC text | Uppercased, spaces removed | Normalization |
| **Remittance** | Joined free text lines | Canonical token-joined, normalized lines | `remittance_separator`, `USE_UTF8PROC` |
| **Structured Reference** | Display reference text | Normalized canonical text | `USE_UTF8PROC` |
| **End-to-End ID** | Displayed as present | Uppercased, spaces removed | Normalization |
| **Mandate ID / Transaction ID / Bank Reference** | Display text | Normalized ID | Normalization |
| **Running Balance** | Accumulated signed balance | Same as display | Determined by sorting order |
| **Opening / Closing Balance** | Displayed balance | Same | Placement depends on statement order |
Normalization rules:
- Text fields are trimmed and unified
- Codes & currency uppercased
- IBAN/BIC spaces removed
- Date `.second` always `YYYYMMDD`
- `Amount.second` is always absolute value
- Advanced Unicode normalization if compiled with `USE_UTF8PROC`
| Option | Default | Description |
|--------|---------|-------------|
| `delimiter` | `';'` | CSV separator (`;`, `,`, or `\t`) |
| `include_header` | `true` | Adds header row |
| `write_utf8_bom` | `false` | Necessary for Excel UTF-8 import |
| `signed_amount` | `true` | Credit positive / Debit negative |
| `credit_as_bool` | `true` | `IsCredit = 1/0` instead of `CRDT/DBIT` |
| `remittance_separator` | `""` | Join multiple `Ustrd[]` lines |
| `use_effective_credit` | `false` | Apply reversal indicator |
| `prefer_ultimate_counterparty` | `true` | Prefer `UltmtDbtr` / `UltmtCdtr` |
## Exported Row Format: first vs second
Every column in `ExportData` is stored as a pair:
- `first` → human-readable, formatted display value
- `second` → normalized, canonical value used for sorting, hashing, and stable processing
If a field does not explicitly assign `second`, it is filled from `first` via normalization.
Normalization rules include:
- Date fields → `YYYYMMDD`
- IBAN/BIC → uppercase, no spaces
- Free-text fields → trimmed, casefolded (with utf8proc if enabled)
- Amounts → `second` stores absolute value
Sorting uses `.second` for date ordering.
Running balance uses signed logic independent of display formatting.
## Optional Unicode Normalization (`USE_UTF8PROC`)
This library supports optional full Unicode normalization of free-text fields
(e.g., remittance lines, counterparty names, references).
If compiled with `-DUSE_UTF8PROC`, the following normalization is applied using
the MIT-licensed `utf8proc` library:
- Normalize text to **NFC**
- Unicode-aware case folding
- Removal of zero-width characters
- Stable whitespace normalization
If `USE_UTF8PROC` is **not** defined:
- A lightweight ASCII-only fallback is used
- No external dependencies are required
- No utf8proc code is linked or distributed
## Canonical Transaction Hashing
```cpp
std::string hash = camt::accumulate_hash_row(row);
```
Stable fingerprint for:
- Duplicate detection
- Ledger synchronization
- Audit trails
## License
Released under the **MIT License**.
```
SPDX-License-Identifier: MIT
SPDX-FileCopyrightText: 2025 Psynetic
```