Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/google/highwayhash

Fast strong hash functions: SipHash/HighwayHash
https://github.com/google/highwayhash
Last synced: 3 days ago
JSON representation
Fast strong hash functions: SipHash/HighwayHash
Host: GitHub
URL: https://github.com/google/highwayhash
Owner: google
License: apache-2.0
Created: 2016-02-17T16:29:40.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2024-04-18T16:08:17.000Z (9 months ago)
Last Synced: 2025-01-11T02:06:01.474Z (10 days ago)
Language: C++
Homepage:
Size: 576 KB
Stars: 1,555
Watchers: 50
Forks: 187
Open Issues: 8
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING
- License: LICENSE
Awesome Lists containing this project

README

        Strong (well-distributed and unpredictable) hashes:

*   Portable implementation of

    [SipHash](https://www.131002.net/siphash/siphash.pdf)

*   HighwayHash, a 5x faster SIMD hash with [security

    claims](https://arxiv.org/abs/1612.06257)

## Quick Start

To build on a Linux or Mac platform, simply run `make`. For Windows, we provide

a Visual Studio 2015 project in the `msvc` subdirectory.

Run `benchmark` for speed measurements. `sip_hash_test` and `highwayhash_test`

ensure the implementations return known-good values for a given set of inputs.

64-bit SipHash for any CPU:

```

    #include "highwayhash/sip_hash.h"

    using namespace highwayhash;

    HH_ALIGNAS(16) const HH_U64 key2[2] = {1234, 5678};

    char in[8] = {1};

    return SipHash(key2, in, 8);

```

64, 128 or 256 bit HighwayHash for the CPU determined by compiler flags:

```

    #include "highwayhash/highwayhash.h"

    using namespace highwayhash;

    HH_ALIGNAS(32) const HHKey key = {1, 2, 3, 4};

    char in[8] = {1};

    HHResult64 result;  // or HHResult128 or HHResult256

    HHStateT state(key);

    HighwayHashT(&state, in, 8, &result);

```

64, 128 or 256 bit HighwayHash for the CPU on which we're currently running:

```

    #include "highwayhash/highwayhash_target.h"

    #include "highwayhash/instruction_sets.h"

    using namespace highwayhash;

    HH_ALIGNAS(32) const HHKey key = {1, 2, 3, 4};

    char in[8] = {1};

    HHResult64 result;  // or HHResult128 or HHResult256

    InstructionSets::Run(key, in, 8, &result);

```

C-callable 64-bit HighwayHash for the CPU on which we're currently running:

    #include "highwayhash/c_bindings.h"

    const uint64_t key[4] = {1, 2, 3, 4};

    char in[8] = {1};

    return HighwayHash64(key, in, 8);

Printing a 256-bit result in a hexadecimal format similar to sha1sum:

    HHResult256 result;

    printf("%016"PRIx64"%016"PRIx64"%016"PRIx64"%016"PRIx64"\n",

         result[3], result[2], result[1], result[0]);

## Introduction

Hash functions are widely used, so it is desirable to increase their speed and

security. This package provides two 'strong' (well-distributed and

unpredictable) hash functions: a faster version of SipHash, and an even faster

algorithm we call HighwayHash.

SipHash is a fast but 'cryptographically strong' pseudo-random function by

Aumasson and Bernstein [https://www.131002.net/siphash/siphash.pdf].

HighwayHash is a new way of mixing inputs which may inspire new

cryptographically strong hashes. Large inputs are processed at a rate of 0.24

cycles per byte, and latency remains low even for small inputs. HighwayHash is

faster than SipHash for all input sizes, with 5 times higher throughput at 1

KiB. We discuss design choices and provide statistical analysis and preliminary

cryptanalysis in https://arxiv.org/abs/1612.06257.

## Applications

Unlike prior strong hashes, these functions are fast enough to be recommended

as safer replacements for weak hashes in many applications. The additional CPU

cost appears affordable, based on profiling data indicating C++ hash functions

account for less than 0.25% of CPU usage.

Hash-based selection of random subsets is useful for A/B experiments and similar

applications. Such random generators are idempotent (repeatable and

deterministic), which is helpful for parallel algorithms and testing. To avoid

bias, it is important that the hash function be unpredictable and

indistinguishable from a uniform random generator. We have verified the bit

distribution and avalanche properties of SipHash and HighwayHash.

64-bit hashes are also useful for authenticating short-lived messages such as

network/RPC packets. This requires that the hash function withstand

differential, length extension and other attacks. We have published a formal

security analysis for HighwayHash. New cryptanalysis tools may still need to be

developed for further analysis.

Strong hashes are also important parts of methods for protecting hash tables

against unacceptable worst-case behavior and denial of service attacks

(see "hash flooding" below).

128 and 256-bit hashes can be useful for verifying data integrity (checksums).

## SipHash

Our SipHash implementation is a fast and portable drop-in replacement for

the reference C code. Outputs are identical for the given test cases (messages

between 0 and 63 bytes).

Interestingly, it is about twice as fast as a SIMD implementation using SSE4.1

(https://goo.gl/80GBSD). This is presumably due to the lack of SIMD bit rotate

instructions prior to AVX-512.

SipHash13 is a faster but weaker variant with one mixing round per update and

three during finalization.

We also provide a data-parallel 'tree hash' variant that enables efficient SIMD

while retaining safety guarantees. This is about twice as fast as SipHash, but

does not return the same results.

## HighwayHash

We have devised a new way of mixing inputs with SIMD multiply and permute

instructions. The multiplications are 32x32 -> 64 bits and therefore infeasible

to reverse. Permuting equalizes the distribution of the resulting bytes.

The internal state is quite large (1024 bits) but fits within SIMD registers.

Due to limitations of the AVX2 instruction set, the registers are partitioned

into two 512-bit halves that remain independent until the reduce phase. The

algorithm outputs 64 bit digests or up to 256 bits at no extra cost.

In addition to high throughput, the algorithm is designed for low finalization

cost. The result is more than twice as fast as SipTreeHash.

We also provide an SSE4.1 version (80% as fast for large inputs and 95% as fast

for short inputs), an implementation for VSX on POWER and a portable version

(10% as fast). A third-party ARM implementation is referenced below.

Statistical analyses and preliminary cryptanalysis are given in

https://arxiv.org/abs/1612.06257.

## Versioning and stability

Now that 21 months have elapsed since their initial release, we have declared

all (64/128/256 bit) variants of HighwayHash frozen, i.e. unchanging forever.

SipHash and HighwayHash are 'fingerprint functions' whose input -> hash

mapping will not change. This is important for applications that write hashes to

persistent storage.

## Speed measurements

To measure the CPU cost of a hash function, we can either create an artificial

'microbenchmark' (easier to control, but probably not representative of the

actual runtime), or insert instrumentation directly into an application (risks

influencing the results through observer overhead). We provide novel variants of

both approaches that mitigate their respective disadvantages.

profiler.h uses software write-combining to stream program traces to memory

with minimal overhead. These can be analyzed offline, or when memory is full,

to learn how much time was spent in each (possibly nested) zone.

nanobenchmark.h enables cycle-accurate measurements of very short functions.

It uses CPU fences and robust statistics to minimize variability, and also

avoids unrealistic branch prediction effects.

We compile the 64-bit C++ implementations with a patched GCC 4.9 and run on a

single idle core of a Xeon E5-2690 v3 clocked at 2.6 GHz. CPU cost is measured

as cycles per byte for various input sizes:

Algorithm        | 8     | 31   | 32   | 63   | 64   | 1024

---------------- | ----- | ---- | ---- | ---- | ---- | ----

HighwayHashAVX2  | 7.34  | 1.81 | 1.71 | 1.04 | 0.95 | 0.24

HighwayHashSSE41 | 8.00  | 2.11 | 1.75 | 1.13 | 0.96 | 0.30

SipTreeHash      | 16.51 | 4.57 | 4.09 | 2.22 | 2.29 | 0.57

SipTreeHash13    | 12.33 | 3.47 | 3.06 | 1.68 | 1.63 | 0.33

SipHash          | 8.13  | 2.58 | 2.73 | 1.87 | 1.93 | 1.26

SipHash13        | 6.96  | 2.09 | 2.12 | 1.32 | 1.33 | 0.68

SipTreeHash is slower than SipHash for small inputs because it processes blocks

of 32 bytes. AVX2 and SSE4.1 HighwayHash are faster than SipHash for all input

sizes due to their highly optimized handling of partial vectors.

Note that previous measurements included the initialization of their input,

which dramatically increased timings especially for small inputs.

## CPU requirements

SipTreeHash(13) requires an AVX2-capable CPU (e.g. Haswell). HighwayHash

includes a dispatcher that chooses the implementation (AVX2, SSE4.1, VSX or

portable)  at runtime, as well as a directly callable function template that can

only run on the CPU for which it was built. SipHash(13) and

ScalarSipTreeHash(13) have no particular CPU requirements.

### AVX2 vs SSE4

When both AVX2 and SSE4 are available, the decision whether to use AVX2 is

non-obvious. AVX2 vectors are twice as wide, but require a higher power license

(integer multiplications count as 'heavy' instructions) and can thus reduce the

clock frequency of the core or entire socket(!) on Haswell systems. This

partially explains the observed 1.25x (not 2x) speedup over SSE4. Moreover, it

is inadvisable to only sporadically use AVX2 instructions because there is also

a ~56K cycle warmup period during which AVX2 operations are slower, and Haswell

can even stall during this period. Thus, we recommend avoiding AVX2 for

infrequent hashing if the rest of the application is also not using AVX2. For

any input larger than 1 MiB, it is probably worthwhile to enable AVX2.

### SIMD implementations

Our x86 implementations use custom vector classes with overloaded operators

(e.g. `const V4x64U a = b + c`) for type-safety and improved readability vs.

compiler intrinsics (e.g. `const __m256i a = _mm256_add_epi64(b, c)`).

The VSX implementation uses built-in vector types alongside Altivec intrinsics.

A high-performance third-party ARM implementation is mentioned below.

### Dispatch

Our instruction_sets dispatcher avoids running newer instructions on older CPUs

that do not support them. However, intrinsics, and therefore also any vector

classes that use them, require (on GCC < 4.9 or Clang < 3.9) a compiler flag

that also allows the compiler to generate code for that CPU. This means the

intrinsics must be placed in separate translation units that are compiled with

the required flags. It is important that these source files and their headers

not define any inline functions, because that might break the one definition

rule and cause crashes.

To minimize dispatch overhead when hashes are computed often (e.g. in a loop),

we can inline the hash function into its caller using templates. The dispatch

overhead will only be paid once (e.g. before the loop). The template mechanism

also avoids duplicating code in each CPU-specific implementation.

## Defending against hash flooding

To mitigate hash flooding attacks, we need to take both the hash function and

the data structure into account.

We wish to defend (web) services that utilize hash sets/maps against

denial-of-service attacks. Such data structures assign attacker-controlled

input messages `m` to a hash table bin `b` by computing the hash `H(s, m)`

using a hash function `H` seeded by `s`, and mapping it to a bin with some

narrowing function `b = R(h)`, discussed below.

Attackers may attempt to trigger 'flooding' (excessive work in insertions or

lookups) by finding multiple `m` that map to the same bin. If the attacker has

local access, they can do far worse, so we assume the attacker can only issue

remote requests. If the attacker is able to send large numbers of requests,

they can already deny service, so we need only ensure the attacker's cost is

sufficiently large compared to the service's provisioning.

If the hash function is 'weak', attackers can easily generate 'hash collisions'

(inputs mapping to the same hash values) that are independent of the seed. In

other words, certain input messages will cause collisions regardless of the seed

value. The author of SipHash has published C++ programs to generate such

'universal (key-independent) multicollisions' for CityHash and Murmur. Similar

'differential' attacks are likely possible for any hash function consisting only

of reversible operations (e.g. addition/multiplication/rotation) with a constant

operand. `n` requests with such inputs cause `n^2` work for an unprotected hash

table, which is unacceptable.

By contrast, 'strong' hashes such as SipHash or HighwayHash require infeasible

attacker effort to find a hash collision (an expected 2^32 guesses of `m` per

the birthday paradox) or recover the seed (2^63 requests). These security claims

assume the seed is secret. It is reasonable to suppose `s` is initially unknown

to attackers, e.g. generated on startup or even per-connection. A timing attack

by Wool/Bar-Yosef recovers 13-bit seeds by testing all 8K possibilities using

millions of requests, which takes several days (even assuming unrealistic 150 us

round-trip times). It appears infeasible to recover 64-bit seeds in this way.

However, attackers are only looking for multiple `m` mapping to the same bin

rather than identical hash values. We assume they know or are able to discover

the hash table size `p`. It is common to choose `p = 2^i` to enable an efficient

`R(h) := h & (p - 1)`, which simply retains the lower hash bits. It may be

easier for attackers to compute partial collisions where only the lower `i` bits

match. This can be prevented by choosing a prime `p` so that `R(h) := h % p`

incorporates all hash bits. The costly modulo operation can be avoided by

multiplying with the inverse (https://goo.gl/l7ASm8). An interesting alternative

suggested by Kyoung Jae Seo chooses a random subset of the `h` bits. Such an `R`

function can be computed in just 3 cycles using PEXT from the BMI2 instruction

set. This is expected to defend against SAT-solver attacks on the hash bits at a

slightly lower cost than the multiplicative inverse method, and still allows

power-of-two table sizes.

Summary thus far: given a strong hash function and secret seed, it appears

infeasible for attackers to generate hash collisions because `s` and/or `R` are

unknown. However, they can still observe the timings of data structure

operations for various `m`. With typical table sizes of 2^10 to 2^17 entries,

attackers can detect some 'bin collisions' (inputs mapping to the same bin).

Although this will be costly for the attacker, they can then send many instances

of such inputs, so we need to limit the resulting work for our data structure.

Hash tables with separate chaining typically store bin entries in a linked list,

so worst-case inputs lead to unacceptable linear-time lookup cost. We instead

seek optimal asymptotic worst-case complexity for each operation (insertion,

deletion and lookups), which is a constant factor times the logarithm of the

data structure size. This naturally leads to a tree-like data structure for each

bin. The Java8 HashMap only replaces its linked list with trees when needed.

This leads to additional cost and complexity for deciding whether a bin is a

list or tree.

Our first proposal (suggested by Github user funny-falcon) avoids this overhead

by always storing one tree per bin. It may also be worthwhile to store the first

entry directly in the bin, which avoids allocating any tree nodes in the common

case where bins are sparsely populated. What kind of tree should be used?

Given SipHash and HighwayHash provide high quality randomness, depending on

expecting attack surface simple non-balancing binary search tree could perform

reasonably well. [Wikipedia says](https://en.wikipedia.org/wiki/Binary_search_tree#Definition)

> After a long intermixed sequence of random insertion and deletion, the

> expected height of the tree approaches square root of the number of keys, √n,

> which grows much faster than log n.

While `O(√n)` is much larger than `O(log n)`, it is still much smaller than `O(n)`.

And it will certainly complicate the timing attack, since the time of operation

on collisioned bin will grow slower.

If stronger safety guarantees are needed, then a balanced tree should be used.

Scapegoat and splay trees only offer amortized complexity guarantees, whereas

treaps require an entropy source and have higher constant factors in practice.

Self-balancing structures such as 2-3 or red-black trees require additional

bookkeeping information. We can hope to reduce rebalancing cost by realizing

that the output bits of strong `H` functions are uniformly distributed. When

using them as keys instead of the original message `m`, recent relaxed balancing

schemes such as left-leaning red-black or weak AVL trees may require fewer tree

rotations to maintain their invariants. Note that `H` already determines the

bin, so we should only use the remaining bits. 64-bit hashes are likely

sufficient for this purpose, and HighwayHash generates up to 256 bits. It seems

unlikely that attackers can craft inputs resulting in worst cases for both the

bin index and tree key without being able to generate hash collisions, which

would contradict the security claims of strong hashes. Even if they succeed, the

relaxed tree balancing still guarantees an upper bound on height and therefore

the worst-case operation cost. For the AVL variant, the constant factors are

slightly lower than for red-black trees.

The second proposed approach uses augmented/de-amortized cuckoo hash tables

(https://goo.gl/PFwwkx). These guarantee worst-case `log n` bounds for all

operations, but only if the hash function is 'indistinguishable from random'

(uniformly distributed regardless of the input distribution), which is claimed

for SipHash and HighwayHash but certainly not for weak hashes.

Both alternatives retain good average case performance and defend against

flooding by limiting the amount of extra work an attacker can cause. The first

approach guarantees an upper bound of `log n` additional work even if the hash

function is compromised.

In summary, a strong hash function is not, by itself, sufficient to protect a

chained hash table from flooding attacks. However, strong hash functions are

important parts of two schemes for preventing denial of service. Using weak hash

functions can slightly accelerate the best-case and average-case performance of

a service, but at the risk of greatly reduced attack costs and worst-case

performance.

## Third-party implementations / bindings

Thanks to Damian Gryski and Frank Wessels for making us aware of these

third-party implementations or bindings. Please feel free to get in touch or

raise an issue and we'll add yours as well.

By | Language | URL

--- | --- | ---

Damian Gryski | Go and x64 assembly | https://github.com/dgryski/go-highway/

Simon Abdullah | NPM package | https://www.npmjs.com/package/highwayhash-nodejs

Lovell Fuller | node.js bindings | https://github.com/lovell/highwayhash

Andreas Sonnleitner | [WebAssembly](https://github.com/asonnleitner/highwayhash-wasm) and NPM package | https://www.npmjs.com/package/highwayhash-wasm

Nick Babcock | Rust port | https://github.com/nickbabcock/highway-rs

Caleb Zulawski | Rust portable SIMD | https://github.com/calebzulawski/autobahn-hash

Vinzent Steinberg | Rust bindings | https://github.com/vks/highwayhash-rs

Frank Wessels & Andreas Auernhammer | Go and ARM assembly | https://github.com/minio/highwayhash

Phil Demetriou | Python 3 bindings | https://github.com/kpdemetriou/highwayhash-cffi

Jonathan Beard | C++20 constexpr | https://gist.github.com/jonathan-beard/632017faa1d9d1936eb5948ac9186657

James Cook | Ruby bindings | https://github.com/jamescook/highwayhash

John Platts | C++17 Google Highway port | https://github.com/johnplatts/simdhwyhash

## Modules

### Hashes

*   c_bindings.h declares C-callable versions of SipHash/HighwayHash.

*   sip_hash.cc is the compatible implementation of SipHash, and also provides

    the final reduction for sip_tree_hash.

*   sip_tree_hash.cc is the faster but incompatible SIMD j-lanes tree hash.

*   scalar_sip_tree_hash.cc is a non-SIMD version.

*   state_helpers.h simplifies the implementation of the SipHash variants.

*   highwayhash.h is our new, fast hash function.

*   hh_{avx2,sse41,vsx,portable}.h are its various implementations.

*   highwayhash_target.h chooses the best available implementation at runtime.

### Infrastructure

*   arch_specific.h offers byte swapping and CPUID detection.

*   compiler_specific.h defines some compiler-dependent language extensions.

*   data_parallel.h provides a C++11 ThreadPool and PerThread (similar to

    OpenMP).

*   instruction_sets.h and targets.h enable efficient CPU-specific dispatching.

*   nanobenchmark.h measures elapsed times with < 1 cycle variability.

*   os_specific.h sets thread affinity and priority for benchmarking.

*   profiler.h is a low-overhead, deterministic hierarchical profiler.

*   tsc_timer.h obtains high-resolution timestamps without CPU reordering.

*   vector256.h and vector128.h contain wrapper classes for AVX2 and SSE4.1.

By Jan Wassenberg  and Jyrki Alakuijala

, updated 2023-03-29

This is not an official Google product.