{"id":30604254,"url":"https://github.com/apotocki/dataforge","last_synced_at":"2025-08-30T01:37:15.416Z","repository":{"id":310083864,"uuid":"1038617885","full_name":"apotocki/dataforge","owner":"apotocki","description":"A C++20 header-only library for building powerful, composable data transformation pipelines — from integer ↔ bytes, base encodings, hashing, compression, and encryption to Unicode conversions.","archived":false,"fork":false,"pushed_at":"2025-08-15T16:52:10.000Z","size":285,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-15T17:36:35.182Z","etag":null,"topics":["c-plus-plus","checksums","compression","cpp20","data-pipeline","data-transformation","decoding","encoding","encryption","endian","expression-templates","hashing","header-only","icu","iterator","pipeline","unicode"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apotocki.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":"FUNDING.yml","license":"LICENSE_1_0.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"apotocki"}},"created_at":"2025-08-15T14:29:39.000Z","updated_at":"2025-08-15T16:52:13.000Z","dependencies_parsed_at":"2025-08-16T14:01:12.607Z","dependency_job_id":null,"html_url":"https://github.com/apotocki/dataforge","commit_stats":null,"previous_names":["apotocki/dataforge"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/apotocki/dataforge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apotocki%2Fdataforge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apotocki%2Fdataforge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apotocki%2Fdataforge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apotocki%2Fdataforge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apotocki","download_url":"https://codeload.github.com/apotocki/dataforge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apotocki%2Fdataforge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272792736,"owners_count":24993827,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-29T02:00:10.610Z","response_time":87,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","checksums","compression","cpp20","data-pipeline","data-transformation","decoding","encoding","encryption","endian","expression-templates","hashing","header-only","icu","iterator","pipeline","unicode"],"created_at":"2025-08-30T01:37:14.587Z","updated_at":"2025-08-30T01:37:15.405Z","avatar_url":"https://github.com/apotocki.png","language":"C++","funding_links":["https://github.com/sponsors/apotocki"],"categories":[],"sub_categories":[],"readme":"[![Windows MSVC Tests](https://github.com/apotocki/dataforge/actions/workflows/msvc-tests.yml/badge.svg)](https://github.com/apotocki/dataforge/actions/workflows/msvc-tests.yml)\n[![Linux GCC Tests](https://github.com/apotocki/dataforge/actions/workflows/linux-tests.yml/badge.svg)](https://github.com/apotocki/dataforge/actions/workflows/linux-tests.yml)\n[![macOS Tests](https://github.com/apotocki/dataforge/actions/workflows/macos-tests.yml/badge.svg)](https://github.com/apotocki/dataforge/actions/workflows/macos-tests.yml)\n# DataForge\n\n**DataForge** is a modern C++20 header-only library for building declarative, composable data transformation pipelines.  \nIt provides both push (output) and pull (input) iterator-based interfaces for applying arbitrary chains of conversions, including encoding, decoding, compression, encryption, hashing, and Unicode operations.  \nTransformations are described using *quarks* — small, composable objects that can be chained together with the `|` operator.\n\n## Quick Example\n\n```cpp\n#include \"dataforge/quark_push_iterator.hpp\"\n#include \"dataforge/quark_pull_iterator.hpp\"\n#include \"dataforge/base_xx/base64.hpp\"\n\nusing namespace dataforge;\n\nstd::string input = \"Hello, World!\";\nstd::string base64_result;\n\n// Create a pipeline: input bytes → Base64 encoding → output\nauto push_it = quark_push_iterator(int8 | base64, std::back_inserter(base64_result));\n*push_it = input;\npush_it.finish();\n\nstd::cout \u003c\u003c \"Encoded: \" \u003c\u003c base64_result \u003c\u003c std::endl;  // Output: SGVsbG8sIFdvcmxkIQ==\n\n// Reverse the process: Base64 → decoded bytes\nstd::string decoded_result;\nauto pull_it = quark_pull_iterator(base64 | int8, base64_result);\nfor (auto span = *pull_it; !span.empty(); span = *++pull_it) {\n    std::copy(span.begin(), span.end(), std::back_inserter(decoded_result));\n}\n\nstd::cout \u003c\u003c \"Decoded: \" \u003c\u003c decoded_result \u003c\u003c std::endl;  // Output: Hello, World!\n```\n\n**More complex pipelines** can chain multiple transformations:\n```cpp\n// Example: text → UTF-8 → compression → encryption → Base64\nauto pipeline = utf8 | deflated() | aes(128, key) | base64;\n```\n\n\u003e 📁 **See the [examples/](examples/) folder for complete working examples** including MD5 hashing, AES encryption, and more advanced use cases.\n\u003e \n\u003e 🧪 **For comprehensive algorithm coverage and advanced pipeline patterns, explore the [tests/](tests/) directory** — it contains hundreds of real-world examples demonstrating every supported algorithm, from basic CRC checksums to complex multi-stage encryption pipelines.\n\n## Why DataForge is Unique\n\nDataForge combines multiple types of data transformations in one consistent framework, unlike other libraries that cover only subsets of functionality.\n\n| Feature / Capability         | DataForge | Crypto++ | Boost | ICU | range-v3 |\n|-------------------------------|:---------:|:--------:|:----:|:---:|:--------:|\n| Integer ↔ Bytes + Endian     | ✅        | ❌       | ❌   | ❌  | ❌       |\n| base16/32/58/64/ascii85/z85  | ✅        | ✅       | ❌   | ❌  | ❌       |\n| Custom Base 1 \u003c N \u003c 256      | ✅        | ❌       | ❌   | ❌  | ❌       |\n| Checksums (crc, adler, bsd) | ✅        | ❌       | ❌   | ❌  | ❌       |\n| Hashes (MD, SHA, Blake, etc)| ✅        | ✅       | ❌   | ❌  | ❌       |\n| Encryption/Decryption        | ✅        | ✅       | ❌   | ❌  | ❌       |\n| Compression / Decompression  | ✅        | ❌       | ❌   | ❌  | ❌       |\n| Unicode Conversions (UTF)    | ✅        | ❌       | ❌   | ✅  | ❌       |\n| ICU Charset Conversions      | ✅        | ❌       | ❌   | ✅  | ❌       |\n| Grapheme Breaking            | ✅        | ❌       | ❌   | ✅  | ❌       |\n| Header-only                  | ✅        | ❌       | ✅   | ❌  | ✅       |\n| Push/Pull iterator pipelines | ✅        | ❌       | ✅ (filters) | ❌ | ✅       |\n\n**Key point:** DataForge allows chaining transformations like integer → endian → compression → encryption → base encoding in one declarative pipeline.\n\n## Key Features\n\n### 1. Integer ↔ Byte sequence conversions (with endianness)\n- Convert sequences of integers of various sizes to/from byte sequences.\n- Configurable **little-endian** or **big-endian** representation.\n\n### 2. Encoding / Decoding\n- **Base16, Base32, Base58, Base64, ASCII85, Z85**.\n- Arbitrary base conversion with `1 \u003c N \u003c 256` and a custom alphabet — effectively a positional numeral system transformation.\n\n### 3. Checksums\n- BSD checksum\n- Adler32\n- CRC8, CRC16, CRC32, CRC64\n\n### 4. Hash Functions\n- MD2, MD4, MD5, MD6\n- RIPEMD, Tiger\n- SHA1, SHA2, SHA3\n- Belt, GOST, Streebog, Whirlpool, Blake\n\n### 5. Encryption / Decryption\n- RC2, RC4, RC5, RC6\n- DES, AES, Blowfish\n- Belt, Magma\n\n### 6. Compression / Decompression\n- Deflate\n- Bzip2\n- LZ4\n- LZMA, LZMA2  \n*(requires corresponding external libraries)*\n\n### 7. Unicode Encoding Conversions\n- UTF-7, UTF-8, UTF-16, UTF-32\n\n### 8. ICU-based String Encoding Conversions\n- Any encoding supported by the [ICU library](https://icu.unicode.org/)  \n*(requires ICU library)*\n\n### 9. Grapheme Breaker\n- Splits a Unicode string into graphemes according to the [Unicode Standard](https://unicode.org/reports/tr29/).\n\n\n## Installation for Running Tests\n\nThe library itself is **header-only** — nothing needs to be built for use in your projects.  \nHowever, the test suite depends on external libraries (**zlib, icu, bzip2, lz4, liblzma, gtest**), which are managed via [vcpkg](https://github.com/microsoft/vcpkg).\n\n### Steps to build and run tests:\n1. **Install vcpkg** anywhere on your system (if not already installed).\n2. **Set the environment variable** `VCPKG_ROOT` to the location of your vcpkg installation.  \n   - Example (Windows PowerShell):\n     ```powershell\n     setx VCPKG_ROOT \"C:\\dev\\vcpkg\"\n     ```\n3. Open the Visual Studio solution for tests and build it.  \n   - On the first build:\n     - The project will automatically:\n       1. Check that `VCPKG_ROOT` is set.\n       2. Run:\n          ```powershell\n          $(VCPKG_ROOT)\\vcpkg.exe install\n          ```\n          installing all required dependencies from `vcpkg.json` into a local `vcpkg_installed` folder.\n       3. Configure `INCLUDE` and `LIB` paths to use these locally installed dependencies.\n4. Run the tests from Visual Studio.\n\n**No global vcpkg integration (`vcpkg integrate install`) is required** — everything is local to the repository.\n\n## License\n\nDistributed under the [Boost Software License, Version 1.0](LICENSE).\n\n---\n\n## As an advertisement...\nThe Dataforge library is used in my iOS application on the App Store:\n\n[\u003ctable align=\"center\" border=0 cellspacing=0 cellpadding=0\u003e\u003ctr\u003e\u003ctd\u003e\u003cimg src=\"https://is4-ssl.mzstatic.com/image/thumb/Purple112/v4/78/d6/f8/78d6f802-78f6-267a-8018-751111f52c10/AppIcon-0-1x_U007emarketing-0-10-0-85-220.png/460x0w.webp\" width=\"70\"/\u003e\u003c/td\u003e\u003ctd\u003e\u003ca href=\"https://apps.apple.com/us/app/potohex/id1620963302\"\u003ePotoHEX\u003c/a\u003e\u003cbr\u003eHEX File Viewer \u0026 Editor\u003c/td\u003e\u003ctr\u003e\u003c/table\u003e]()\n\nThis application is designed to view and edit files at the byte or character level; calculate different hashes, encode/decode, and compress/decompress desired byte regions.\n  \nYou can support my open-source development by trying the [App](https://apps.apple.com/us/app/potohex/id1620963302).\n\nFeedback is welcome!\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapotocki%2Fdataforge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapotocki%2Fdataforge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapotocki%2Fdataforge/lists"}