https://github.com/psynetic-software/camt-parser

CAMT XML Parser for SEPA banking data (C++).
https://github.com/psynetic-software/camt-parser
banking camt camt052 camt053 camt054 cpp csv finance mit parser qif sepa swift xml
Last synced: 5 months ago
JSON representation
CAMT XML Parser for SEPA banking data (C++).
Host: GitHub
URL: https://github.com/psynetic-software/camt-parser
Owner: psynetic-software
License: mit
Created: 2025-11-06T15:08:24.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-11-06T18:18:08.000Z (5 months ago)
Last Synced: 2025-11-06T18:19:47.467Z (5 months ago)
Topics: banking, camt, camt052, camt053, camt054, cpp, csv, finance, mit, parser, qif, sepa, swift, xml
Language: C++
Homepage: https://www.taxpool.net/
Size: 34.2 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # camt-parser

A high-performance C++ library for parsing SEPA CAMT XML banking formats.

Supports:

- **camt.052** Bank-to-Customer Account Report

- **camt.053** Bank-to-Customer Statement

- **camt.054** Bank-to-Customer Debit/Credit Notification

The library provides a structured, domain-friendly data model and an optional

CSV export layer optimized for accounting, reconciliation, and audit workloads.

## Features

- Fully supports **camt.052 / .053 / .054**

- Extracts complete transaction details (counterparty, remittance, references)

- Unicode-aware free-text normalization (optional `USE_UTF8PROC`)

- Bank transaction code → **GVC** mapping included

- Optional **canonical transaction hash** for duplicate detection

- CSV export designed for **accounting systems**

- Zero locale-dependencies (monetary values kept as text until formatted)

- **MIT License** (commercial-friendly)

## Installation / Build

```bash

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release

cmake --build build

```

Optional dependencies:

- **pugixml** (MIT) for XML parsing

- **utf8proc** (MIT) for Unicode normalization (if `USE_UTF8PROC` is defined)

## High-Level Data Model

```

Document

 └─ Statement(s)

     └─ Entry(s)

         └─ EntryTransaction(s)

```

### `Document`

Represents a full CAMT file (052 / 053 / 054).

### `Statement`

Contains metadata (account, creation time, balances) and `Entry` records.

### `Entry`

Represents a booked transaction line (booking date, value date, amount).

May contain **multiple** `EntryTransaction` elements.

### `EntryTransaction`

Contains full payment details:

- Counterparty (IBAN/BIC/Name)

- Remittance text (structured + unstructured)

- ISO 20022 BankTransactionCode (Domain / Family / SubFamily)

- Proprietary bank code

- Charges, fees, reversal indicator, etc.

## Parsing API

```cpp

#include 

camt::Parser p;

camt::Document doc;

std::string err;

if (!p.parse_file("statement.xml", doc, &err)) {

    std::cerr << "Parse error: " << err << "\n";

}

```

## CSV Export

```cpp

#include 

std::ofstream out("export.csv");

camt::ExportOptions opt;

opt.write_utf8_bom = true;          // Excel compatible

opt.remittance_separator = " | ";   // readable multi-part purpose text

opt.signed_amount = true;

opt.credit_as_bool = true;

camt::export_entries_csv(doc, &out, nullptr, opt);

```

## Quick Start Example

This example demonstrates parsing a CAMT file and exporting transactions to CSV:

~~~cpp

#include 

#include 

#include 

#include 

int main() {

    camt::Parser parser;

    camt::Document doc;

    std::string err;

    // Parse CAMT XML from file

    if (!parser.parse_file("statement.camt053.xml", doc, &err)) {

        std::cerr << "Parse error: " << err << "\n";

        return 1;

    }

    // Export to CSV

    camt::ExportData data;

    camt::ExportOptions opt;

    opt.include_header = true;

    std::ofstream out("export.csv", std::ios::binary);

    camt::export_entries_csv(doc, &out, &data, opt);

    std::cout << "Export completed: export.csv\n";

    return 0;

}

~~~

### Field Semantics: `first` vs `second`

Each exported field consists of a pair:

- **first** → Human-readable formatted value  

- **second** → Canonical normalized value used for sorting, hashing, and deduplication

`second` is **never empty**: if not present, it is generated via normalization rules.

### Supported Input Methods

| Input type | Function |

|-----------|----------|

| File path | `parse_file(const std::string& path, ...)` |

| Input stream | `parse_file(std::istream&, ...)` |

| Memory buffer | `parse_string(const char* buffer, ...)` |

### Complete Demonstration

A fully working test and demonstration is available here: examples/main.cpp

## ExportOptions

| Option | Default | Description |

|--------|---------|-------------|

| `delimiter` | `';'` | CSV separator (`;`, `,`, or `\t`) |

| `include_header` | `true` | Adds header row |

| `write_utf8_bom` | `false` | Necessary for Excel UTF-8 import |

| `signed_amount` | `true` | Credit positive / Debit negative |

| `credit_as_bool` | `true` | `IsCredit = 1/0` instead of `CRDT/DBIT` |

| `remittance_separator` | `""` | Join multiple `Ustrd[]` lines |

| `use_effective_credit` | `false` | Apply reversal indicator |

| `prefer_ultimate_counterparty` | `true` | Prefer `UltmtDbtr` / `UltmtCdtr` |

## Exported Row Format: Display Value (`first`) vs Normalized Value (`second`)

Each exported field consists of a pair:

- **first**: Human-readable display value (formatted, signed, date-formatted)

- **second**: Canonical normalized value used for sorting, comparison, and deterministic hashing.

`second` is **never empty** — if not assigned directly, it is generated via normalization rules.

### Field overview (C1 display naming)

| Field | `first` (display) | `second` (normalized raw) | Influenced by options |

|---|---|---|---|

| Value Date | `YYYY-MM-DD` | `YYYYMMDD` digits | Sorting (`useBookingDate=false`) |

| Booking Date | `YYYY-MM-DD` | `YYYYMMDD` digits | Sorting (`useBookingDate=true`) |

| Amount | Formatted signed or unsigned | Absolute numeric text | `signed_amount`, `use_effective_credit` |

| Is Credit | `1`/`0` or `CRDT`/`DBIT` | `1`/`0` original direction | `credit_as_bool`, `use_effective_credit` |

| Reversal | `1` or `0` | `1` or `0` | Affects sign if `use_effective_credit=true` |

| Currency | `EUR` etc. | Uppercased, trimmed | Normalization only |

| Counterparty Name | Human-friendly chosen party | NFC/casefold/trim normalized | `prefer_ultimate_counterparty`, normalization |

| Counterparty IBAN | Formatted IBAN | Uppercase, no spaces | Normalization |

| Counterparty BIC | Formatted BIC | Uppercase, no spaces | Normalization |

| Remittance Line | Joined text lines | GS-joined normalized tokens | `remittance_separator`, `USE_UTF8PROC` |

| Remittance Structured | Display text | Normalized base | Normalization |

| End-to-End ID | Shown as provided | Uppercase, no spaces | Normalization |

| Mandate ID | Shown as provided | Uppercase, no spaces | Normalization |

| Transaction ID | Shown as provided | Uppercase, no spaces | Normalization |

| Bank Reference | Display reference | Normalized | Normalization |

| Account IBAN | Statement IBAN | Uppercase, no spaces | Normalization |

| Account BIC | Statement BIC | Uppercase, no spaces | Normalization |

| Booking Code | Code as text | Uppercase trimmed | Normalization |

| Status | Display text | Trimmed | Normalization |

| Running Balance | Formatted running total | Same as display | Always signed logically (`CRDT = +`, `DBIT = −`) |

| Charges Amount | Display charges | Same as display | Independent of `signed_amount` |

| Charges Currency | Display currency | Normalized | Normalization |

| Charges Included | `1`/`0` | Same | None |

| Entry Ordinal | Display index | Same | Used as stable tiebreaker |

| Transaction Ordinal | Display index | Same | Used as stable tiebreaker |

This section ensures consistent interpretation when converting to CSV, databases, or accounting systems.

 (`first`) vs Normalized Value (`second`)

Each exported field consists of a pair:

- **first**: Human-readable display value (formatted, signed, date-formatted)

- **second**: Canonical normalized value used for sorting, comparison, and deterministic hashing.

`second` is **never empty** — if not assigned directly, it is generated via normalization.

| Display Name | `first` (human output) | `second` (normalized / for sorting) | Relevant Options |

|---|---|---|---|

| **Value Date** | `YYYY-MM-DD` | `YYYYMMDD` | Affects sorting order |

| **Booking Date** | `YYYY-MM-DD` | `YYYYMMDD` | Sorting if selected |

| **Amount** | Signed or unsigned amount text | Absolute numeric amount | `signed_amount`, `use_effective_credit` |

| **Is Credit** | `1` (credit) / `0` (debit) or `CRDT/DBIT` | Always `1/0` for original direction | `credit_as_bool`, `use_effective_credit` |

| **Reversal** | `1` or `0` | Same | May flip meaning under `use_effective_credit` |

| **Counterparty Name** | Best resolved party name | Case-normalized canonical text | `prefer_ultimate_counterparty` |

| **Counterparty IBAN** | Standard IBAN text | Uppercased, spaces removed | Normalization |

| **Counterparty BIC** | Standard BIC text | Uppercased, spaces removed | Normalization |

| **Remittance** | Joined free text lines | Canonical token-joined, normalized lines | `remittance_separator`, `USE_UTF8PROC` |

| **Structured Reference** | Display reference text | Normalized canonical text | `USE_UTF8PROC` |

| **End-to-End ID** | Displayed as present | Uppercased, spaces removed | Normalization |

| **Mandate ID / Transaction ID / Bank Reference** | Display text | Normalized ID | Normalization |

| **Running Balance** | Accumulated signed balance | Same as display | Determined by sorting order |

| **Opening / Closing Balance** | Displayed balance | Same | Placement depends on statement order |

Normalization rules:

- Text fields are trimmed and unified

- Codes & currency uppercased

- IBAN/BIC spaces removed

- Date `.second` always `YYYYMMDD`

- `Amount.second` is always absolute value

- Advanced Unicode normalization if compiled with `USE_UTF8PROC`

| Option | Default | Description |

|--------|---------|-------------|

| `delimiter` | `';'` | CSV separator (`;`, `,`, or `\t`) |

| `include_header` | `true` | Adds header row |

| `write_utf8_bom` | `false` | Necessary for Excel UTF-8 import |

| `signed_amount` | `true` | Credit positive / Debit negative |

| `credit_as_bool` | `true` | `IsCredit = 1/0` instead of `CRDT/DBIT` |

| `remittance_separator` | `""` | Join multiple `Ustrd[]` lines |

| `use_effective_credit` | `false` | Apply reversal indicator |

| `prefer_ultimate_counterparty` | `true` | Prefer `UltmtDbtr` / `UltmtCdtr` |

## Exported Row Format: first vs second

Every column in `ExportData` is stored as a pair:

- `first` → human-readable, formatted display value

- `second` → normalized, canonical value used for sorting, hashing, and stable processing

If a field does not explicitly assign `second`, it is filled from `first` via normalization.

Normalization rules include:

- Date fields → `YYYYMMDD`

- IBAN/BIC → uppercase, no spaces

- Free-text fields → trimmed, casefolded (with utf8proc if enabled)

- Amounts → `second` stores absolute value

Sorting uses `.second` for date ordering.

Running balance uses signed logic independent of display formatting.

## Optional Unicode Normalization (`USE_UTF8PROC`)

This library supports optional full Unicode normalization of free-text fields

(e.g., remittance lines, counterparty names, references).

If compiled with `-DUSE_UTF8PROC`, the following normalization is applied using

the MIT-licensed `utf8proc` library:

- Normalize text to **NFC**

- Unicode-aware case folding

- Removal of zero-width characters

- Stable whitespace normalization

If `USE_UTF8PROC` is **not** defined:

- A lightweight ASCII-only fallback is used

- No external dependencies are required

- No utf8proc code is linked or distributed

## Canonical Transaction Hashing

```cpp

std::string hash = camt::accumulate_hash_row(row);

```

Stable fingerprint for:

- Duplicate detection

- Ledger synchronization

- Audit trails

## License

Released under the **MIT License**.

```

SPDX-License-Identifier: MIT

SPDX-FileCopyrightText: 2025 Psynetic

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/psynetic-software/camt-parser

Awesome Lists containing this project

README