{"id":16607551,"url":"https://github.com/davidgraeff/modern_cpp_image_to_jpeg","last_synced_at":"2025-10-15T02:06:30.457Z","repository":{"id":54230323,"uuid":"233793849","full_name":"davidgraeff/modern_cpp_image_to_jpeg","owner":"davidgraeff","description":"Modern C++ in practice (Constexpr, std::filesystem, std::source_location etc): A tool to re-encode images in a dir / on a webpage to jpeg","archived":false,"fork":false,"pushed_at":"2021-03-02T07:22:04.000Z","size":347,"stargazers_count":0,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-17T12:15:58.495Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidgraeff.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-14T08:32:19.000Z","updated_at":"2020-01-14T09:16:31.000Z","dependencies_parsed_at":"2022-08-13T09:40:47.208Z","dependency_job_id":null,"html_url":"https://github.com/davidgraeff/modern_cpp_image_to_jpeg","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidgraeff%2Fmodern_cpp_image_to_jpeg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidgraeff%2Fmodern_cpp_image_to_jpeg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidgraeff%2Fmodern_cpp_image_to_jpeg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidgraeff%2Fmodern_cpp_image_to_jpeg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidgraeff","download_url":"https://codeload.github.com/davidgraeff/modern_cpp_image_to_jpeg/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242774625,"owners_count":20183111,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T01:23:06.281Z","updated_at":"2025-10-15T02:06:25.405Z","avatar_url":"https://github.com/davidgraeff.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Modern C++17 (and some C++14) features in practise\n\nThis project shows off a few of the new core language additions as well as std enhancements\nand is a tool to auto-convert all recognised image files in a given directory or on a web-page to jpg files.\n\nImplemented are a C++17 version of the toojpeg library (~600 LOC), an http 'socket' (~600 LOC), a webpage-crawler (~100 LOC).\nFind benchmark results at the end of this document.\n\n## Build\n\nThe buildsystem is based on [CMake](https://cmake.org/download/) (\u003e3.1), no further dependencies are required.\nopenSSL is optional and if found, allows for https URLs to be crawled for images.\n\nBuild with: `mkdir build \u0026\u0026 cd build \u0026\u0026 cmake ../ \u0026\u0026 make` on Unix systems.\n\n## Run\n\nTo load and convert png, gif, jpg, bmp files in a directory, add the input directory as first argument\nand the output directory optionally as second argument:\n\n```shell script\n./image_to_jpeg ./dir_with_images ./out\n```\n\nIf input and output directory are the same, all jpeg encoded files will be stored with a `.new.jpg` extension.\n\nThe first argument can also be a URL. The page will be downloaded and all images referenced within an `\u003cimg src=\"..\"\u003e` tag\nwill be downloaded and jpeg encoded. The default quality is 90.\n\n```shell script\n./image_to_jpeg https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Images ./out\n```\n## Tests\n\nTests are stored in the `tests/` directory. Build them with `cmake` and `make tests`.\nA small C++17 featured test harness has been written in tests/tests.h to avoid external dependencies.\n\n## Documentation\n\nThe code is documented Doxygen-compatible. Build the documentation via `cmake` and `make doc`.\nA prebuild documentation is checked-in at `doc/html`.\n\n## C-legacy\n\nThroughout this project C-legacy has been avoided as much as possible.\nThis includes old-style casts, C-Posix API (except the net Socket API) and the C preprocessor (macros).\n\nFor example:\n* `std::array\u003cuint8_t,8*8\u003e` instead of `uint8_t data[8*8]`\n* `std::experimental::source_location` instead of `__LINE__`, `__FILE` etc\n* `constexpr` instead of macros and `if constexpr ()` for conditional compilations\n\nC-Libraries like the stb_image library have been properly C++ type wrapped (see `src/image_loader.h`) to take advantage of RAII etc.\n\n## C++17 version of the toojpeg library\n\nThe original toojpeg only uses C++11 `auto` keyword and is apart from that pretty much C++03.\nIts code size, many precomputed lookup tables and numeric values and the IO related domain\nmakes it a good candidate to experiment with C++17 features.\n\n#### Modern Application Programming Interfaces\n\nHaving an API that cannot (un)intentionally being misused while offering an efficient interface is essential.\nIdeally the API expresses the developers intention via the C++ type system in regard to input arguments lifetime\nand ownership. \n\nC++17 added the `nodiscard` attribute so that the compilers warn if return values are not used. `noexcept`\nallows the compiler to avoid generating stack-unwinding code for functions that will never throw,\nwhich results in better runtime performance if exceptions are enabled:\n\n```c++\n[[nodiscard]] const char * data() const noexcept {/*...*/}\n```\n\nSmart pointers are used to express ownership:\n\n```c++\nexplicit BitWriter(std::unique_ptr\u003cstd::ostream\u003e \u0026\u0026output_) : output(std::move(output_)) {}\n```\n\nWhile in a non-owning situation, raw pointers and custom (wrapper) types like `ByteView` (a pointer + size) are the correct choice:\n\n```c++\nBitWriter \u0026operator\u003c\u003c(ByteView data) { output-\u003ewrite(data.data(), data.size()); return *this; }\n\nbool writeJpeg(..., const uint8_t *pixels, ...) {/*...*/}\n```\n\nThe new `std::byte` type (C++17) has been used, instead of (unsigned) `char`s.\nTo quote cppreference:\n\u003e \"std::byte is a distinct type that implements the concept of byte as specified in the C++ language definition.\"\n\nTo accommodate the fact that `std::byte` cannot simply (implicitly) being created via a integral number,\na (constexpr) user-defined literal (C++14) helps out (usage: `0xff_bn`):\n\n```c++\nconstexpr std::byte operator \"\" _bn(unsigned long long v) { return std::byte(v); }\n```\n\nAnd finally new composed data types like `std::optional`, `std::any`, `std::variant` (and the existing `std::tuple`)\nand automatic destructuring (\"Structured binding declaration\") of those (C++17) allow for richer return types:\n\n```c++\nconstexpr auto scaled_luminance_chrominance(...) -\u003e std::tuple\u003cstd::array\u003c...\u003e, std::array\u003c...\u003e\u003e {\n    /* ... */\n    return std::make_tuple(scaledLuminance, scaledChrominance);\n}\n\n/// Structured binding declaration\nauto [scaledLuminance, scaledChrominance] = scaled_luminance_chrominance(...);\n``` \n\n#### Compile-time precompute with `constexpr`\n\nIt is not unusual to be in the situation of deciding within space vs time tradeoff bounds.\nA C++ developer up to C++17 may have favoured runtime computations or \"magical constants\",\njust because it is more convenient than writing an external generator tool and include that\nand the results in a buildsystem.\n\nWith C++17 `constexpr` got much more expressive (temporary state like inline variables are supported) \nand it will be extended even more with C++20 (const-boundary aware heap usage).\n\nIn this project in multiple occasions `constexpr` was used, eg to eliminate \"magical numbers\":\n\n```c++\n// Before\nconst auto SqrtHalfSqrt = 1.306562965f;\n// After\nconstexpr double SqrtHalfSqrt = sqrt((2 + sqrt(2)) / 2);\n```\n\nC++ does not yet offer `constexpr` math (an RFC exists),\nbut implementing the approximation for the square root for example is done in just a few lines:\n```c++\ndouble constexpr sqrt(double x) { return sqrt_helper(x, x, 0); }\ndouble constexpr sqrt_helper(double x, double curr, double prev) {\n    return is_close(curr, prev) ? curr : sqrt_helper(x, 0.5 * (curr + x / curr), curr);\n}\nconstexpr auto is_close(T a, T b) -\u003e bool {\n    return std::abs(a - b) \u003c= std::numeric_limits\u003cT\u003e::epsilon() * std::abs(a + b)\n           || std::abs(a - b) \u003c std::numeric_limits\u003cT\u003e::min();\n}\n```\n\nEspecially useful is `constexpr` to pre-compute lookup tables for example the huffman table\n(see `constexpr std::array\u003cBitCode, 256\u003e generateHuffmanTable(const uint8_t numCodes[16], const uint8_t *values)` in `toojpeg_17.hpp`),\nand quantisation tables (in ` constexpr auto quant_table(const std::array\u003cuint8_t, 8 * 8\u003e defaults) -\u003e std::array\u003cstd::byte, 8 * 8\u003e`).\n\nThis allows to use the exact math writen in the original papers (often modulo, division) as we do not need to care about\nruntime penalties.\n\n```c++\n// Before (faster but not obvious substitutions)\nauto row = ZigZagInv[i] \u003e\u003e 3;\nauto column = ZigZagInv[i] \u0026 7;\n// After\nauto row = ZigZagInv[i] / 8;\nauto column = ZigZagInv[i] % 8;\n```\n\n## Http / TCP Socket\n\nA tiny, blocking http tcp socket type has been implemented for this tool,\nbased on the Posix socket and network C-API. As with all C-wrapping types,\nRAII is build on for resource management (socket closing, resource freeing),\nwhich also works with the error handling strategy (exceptions).\n\n\u003e IMO: As soon as `std::expected` is part of C++ (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0323r3.pdf),\n\u003e I will instantly swap out runtime costly exception usage. See also examples in https://github.com/TartanLlama/expected.\n \nAlthough the type can be conditionally compiled\nwith and without SSL support (via openSSL), the implementation avoids massive\nusage of `#ifdef`s.\n\nInstead C++17's `__has_include` for conditional header including\nand type traits in combination with `std::enable_if` (and a bit of SFINAE) have been used.\n```c++\n/// with ssl\ntemplate\u003cbool with_https_ = with_https, typename = std::enable_if_t\u003cwith_https_, std::size_t\u003e\u003e\n[[nodiscard]] std::size_t read_from_socket(std::enable_if_t\u003cwith_https_, std::size_t\u003e dataRead) {\n    return cSSL_ ? // Runtime decision if this is an SSL socket\n           SSL_read(cSSL_, this-\u003ebuffer_.data() + dataRead, this-\u003ebuffer_.size() - dataRead) :\n           read(getSocketId(), this-\u003ebuffer_.data() + dataRead, this-\u003ebuffer_.size() - dataRead);\n}\n\n// without ssl\ntemplate\u003cbool with_https_ = with_https, typename = std::enable_if_t\u003c!with_https_, std::size_t\u003e\u003e\n[[nodiscard]] std::size_t read_from_socket(std::size_t dataRead) {\n    return read(getSocketId(), this-\u003ebuffer_.data() + dataRead, this-\u003ebuffer_.size() - dataRead);\n}\n\n// Conditionally decide what to do, no #ifdef required anymore\nif constexpr (with_https) {\n    ...\n}\n```\n\n## Extended C++ std: Regex, Filesystem, Parallel Algorithms\n\nThe filesystem submodule is a massive addition to C++17 and The Standard Library.\n\nThis project uses `std::filesystem` for simple operations like retrieving the current path, removing a file, checking\nif a file exists, and for more elaborated tasks like enumerating all files, not directories, of a certain directory.\nEven that is just a few lines of code:\n```c++\nusing namespace std::filesystem;\nconst path input(argv[1]);\nauto files = directory_iterator(input);\nstd::for_each(std::execution::par, std::filesystem::begin(files), std::filesystem::end(files), process_file);\n```\n\nThe c++11 `std::regex` (and implicitly the C++17 `std::basic_regex` deduction guide) has been used for\nthe webpage image url crawler, found in `main.cpp::webpage_crawler`. `std::regex` is part of C++11 already,\nbut gained a few more convenience methods. The basic usage pattern in this project is:\n\n```c++\nstd::string page = \"...\";\nstatic std::regex url_regex(R\"(.*src=[\"']([^\"']*?(?:jpg|png|bmp|gif|pnm|JPG|PNG|BMP|GIF|PNM))[\"'].*)\");\nstd::match_results\u003cstd::string::const_iterator\u003e match;\n\nwhile (std::regex_search(page, match, url_regex)) {\n    auto image_url = Socket::Url::from_relative(url, match[1].str());\n    page = match.suffix();\n}\n```\n\nThe parallel algorithm support of C++17's std has been used to read, compute and output multiple files in parallel:\n```c++\n#include \u003calgorithm\u003e\n#include \u003cexecution\u003e\nstd::for_each(std::execution::par, std::filesystem::begin(files), std::filesystem::end(files), process_file);\n```\n\nAs this is not a memory bound operation, but mostly an IO one, this speeds up file processing.\n\n`std::execution::par` is one of four (three in C++17) specified strategies and does not give guarantees in respect to the sequence\n(\"invocations executing in the same thread are indeterminately sequenced\") or the execution threads\n(\"...are permitted to execute in either the invoking thread or in a thread implicitly created by the library\").\n\n## Benchmark\n\nA benchmark binary downloads and loads a 2d-projected picture of the earth\n(https://upload.wikimedia.org/wikipedia/commons/3/3d/Eckert4.jpg, 1.6MB, Creative Commons License)\nand writes it with the original and re-implemented toojpeg library. \n\nBuild the benchmark (found in `src/benchmark/main.cpp) via `cmake` and `make benchmark`.\nRun it with `./benchmark` in the build directory. Each routine is run 20 times and the time is summed up.\nNo warm up happens, but invoking the benchmark multiple times shows similar numbers on my Core i7 8th Gen.\n\n```\nOriginal TooJpeg : 201 ms. Bytes: 275324\nTooJpeg17 : 247 ms. Bytes: 275365\n```\n\nThe output byte counts are different because the jpeg comment differs.\n(The test suite makes sure that breaking changes to the code result in failing integration tests.)\nNo data is actually written to disk.\n\nI have expected a performance gain, by pre-computing the lookup tables and got caught surprised by the number.\n\nUsing `std::array` comes with its own implications like an implicit full copy-by-value,\nwhen passed by value in contrast to a C-Array. This results in much more copy operations\nand function arguments cannot be simply passed in CPU registers but require the stack.\nThis can be observed in the given runtime penalty. To mitigate this, some argument-by-value's have\nto be references/pointers instead.\n\nA small benchmark suite, integrated into CI, helps to find negative impacting changes.\nI have failed to set that up more early in the process.\n\n## Acknowledgements\n\nUsed external libraries and header-only libraries:\n\n* openSSL for https (optional)\n* [`stb_image.h`](https://github.com/nothings/stb) for loading images (in src/vendor/sb_image.h).\n* [PicoSHA2](https://github.com/okdshin/PicoSHA2) for the integration tests (in src/vendor/sha2.h).\n* [TooJpeg](https://create.stephan-brumme.com/toojpeg/) the original jpeg encoding library (code can also be found in src/benchmark/toojpeg.*).\n\n---\n\nDavid Gräff, 2020 ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidgraeff%2Fmodern_cpp_image_to_jpeg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidgraeff%2Fmodern_cpp_image_to_jpeg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidgraeff%2Fmodern_cpp_image_to_jpeg/lists"}