https://github.com/wojciechmula/base64simd
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
https://github.com/wojciechmula/base64simd
avx2 avx512 base64 neon simd sse
Last synced: 6 months ago
JSON representation
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
- Host: GitHub
- URL: https://github.com/wojciechmula/base64simd
- Owner: WojciechMula
- License: bsd-2-clause
- Created: 2016-09-04T16:02:47.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2025-02-21T23:00:12.000Z (8 months ago)
- Last Synced: 2025-03-29T04:06:09.807Z (6 months ago)
- Topics: avx2, avx512, base64, neon, simd, sse
- Language: C++
- Homepage: http://0x80.pl/articles/index.html#base64-algorithm-new
- Size: 415 KB
- Stars: 163
- Watchers: 16
- Forks: 14
- Open Issues: 4
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
================================================================================
base64 using SIMD instructions
================================================================================Overview
--------------------------------------------------Repository contains code for encoding and decoding base64 using SIMD instructions.
Depending on CPU's architecture, vectorized encoding is faster than scalar
versions by factor from **2 to 4**; decoding is faster **2 .. 2.7** times.There are several versions of procedures utilizing following instructions sets:
* SSE,
* AVX2,
* AVX512F,
* AVX512BW,
* AVX512VBMI,
* AVX512VL,
* BMI2, and
* ARM Neon.Vectorization approaches were described in a series of articles:
* `Base64 encoding with SIMD instructions`__,
* `Base64 decoding with SIMD instructions`__,
* `Base64 encoding & decoding using AVX512BW instructions`__ (includes AVX512VBMI and AVX512VL),
* `AVX512F base64 coding and decoding`__.__ http://0x80.pl/notesen/2016-01-12-sse-base64-encoding.html
__ http://0x80.pl/notesen/2016-01-17-sse-base64-decoding.html
__ http://0x80.pl/notesen/2016-04-03-avx512-base64.html
__ http://0x80.pl/articles/avx512-foundation-base64.html`Daniel Lemire`__ and I wrote also paper `Faster Base64 Encoding
and Decoding Using AVX2 Instructions`__ which was published
by `ACM Transactiona on the Web`__.__ http://lemire.me
__ https://arxiv.org/abs/1704.00605
__ https://tweb.acm.org/Performance results from various machines are located
in subdirectories ``results``.Project organization
--------------------------------------------------There are separate subdirectories for both algorithms, however both have
the same structure. Each project contains four programs:* ``verify`` --- does simple validation of particular parts of algorithms,
* ``check`` --- validates whole procedures,
* ``speed`` --- compares speed of different variants of procedures,
* ``benchmark`` --- similarly to ``speed`` but works on small buffers and
calculates CPU cycle rate (available only for Intel architectures).Building
--------------------------------------------------Change to either directory ``encode`` or ``decode`` and then use following
``make`` commands... list-table::
:header-rows: 1* - command
- tools
- instruction sets* - ``make``
- ``verify``, ``check``, ``speed``, ``benchmark``
- scalar, SSE, BMI2* - ``make avx2``
- ``verify_avx2``, ``check_avx2``, ``speed_avx2``, ``benchmark_avx2``
- scalar, SSE, BMI2, AVX2* - ``make avx512``
- ``verify_avx512``, ``check_avx512``, ``speed_avx512``, ``benchmark_avx512``
- scalar, SSE, BMI2, AVX2, AVX512F* - ``make avx512bw``
- ``verify_avx512bw``, ``check_avx512bw``, ``speed_avx512bw``, ``benchmark_avx512bw``
- scalar, SSE, BMI2, AVX2, AVX512F, AVX512BW* - ``make avx512vbmi``
- ``verify_avx512vbmi``, ``check_avx512vbmi``, ``benchmark_avx512vbmi``
- scalar, SSE, BMI2, AVX2, AVX512F, AVX512BW, AVX512VBMI
* - ``make xop``
- ``verify_xop``, ``check_xop``, ``speed_xop``, ``benchmark_xop``
- scalar, SSE and AMD XOP* - ``make arm``
- ``verify_arm``, ``check_arm``, ``speed_arm``
- scalar, ARM NeonType ``make run`` (for SSE) or ``make run_ARCH`` to run all programs for given
instruction sets; ``ARCH`` can be "sse", "avx2", "avx512", "avx512bw",
"avx512vbmi", "avx512vl".BMI2 presence is determined based on ``/proc/cpuinfo`` or a counterpart.
When an AVX2 or AVX512 targets are used then BMI2 is enabled by default.AVX512
--------------------------------------------------To compile AVX512 versions of the programs at least GCC 5.3 is required.
GCC 4.9.2 doesn't have AVX512 support.Please download `Intel Software Development Emulator`__ in order to run AVX512
variants via ``make run_avx512``, ``run_avx512bw`` or ``run_avx512vbmi``.
The emulator path should be added to the ``PATH``.__ https://software.intel.com/en-us/articles/intel-software-development-emulator
Known problems
--------------------------------------------------Both encoding and decoding don't match the base64 specification,
there is no processing of data tail, i.e. encoder never produces
'=' chars at the end, and decoder doesn't handle them at all.All these shortcoming are not present in a brilliant library
by Alfred Klomp: https://github.com/aklomp/base64.See also
--------------------------------------------------* Daniel's benchmarks and comparison with state of the art solutions
https://github.com/lemire/fastbase64Who uses our algorithms?
--------------------------------------------------* C/C++ library by **Alfred Klomp** https://github.com/aklomp/base64
* .NET library by **Günther Foidl** https://github.com/gfoidl/Base64
* there was attempt to include an assembly implementation into Go:
https://github.com/golang/go/issues/20206