https://github.com/desmondyeung/scala-hashing
Fast non-cryptographic hash functions for Scala
https://github.com/desmondyeung/scala-hashing
hash-functions hashing murmurhash3 scala xxhash
Last synced: 5 months ago
JSON representation
Fast non-cryptographic hash functions for Scala
- Host: GitHub
- URL: https://github.com/desmondyeung/scala-hashing
- Owner: desmondyeung
- License: apache-2.0
- Created: 2019-08-20T01:51:30.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-08-26T12:59:00.000Z (almost 7 years ago)
- Last Synced: 2023-06-30T04:40:23.817Z (almost 3 years ago)
- Topics: hash-functions, hashing, murmurhash3, scala, xxhash
- Language: Scala
- Homepage:
- Size: 134 KB
- Stars: 69
- Watchers: 5
- Forks: 5
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Scala-Hashing
[](https://travis-ci.com/desmondyeung/scala-hashing)
[](http://codecov.io/github/desmondyeung/scala-hashing?branch=master)
[](https://maven-badges.herokuapp.com/maven-central/com.desmondyeung.hashing/scala-hashing_2.13)
## Overview
Fast non-cryptographic hash functions for Scala. This library provides APIs for computing 32-bit and 64-bit hashes.
Currently implemented hash functions
* [MurmurHash3](https://github.com/aappleby/smhasher) (32-bit)
* [XxHash](https://github.com/Cyan4973/xxHash) (32-bit and 64-bit)
Hash functions in this library can be accessed via either a standard API for hashing primitives, byte arrays, or Java ByteBuffers (direct and non-direct), or a streaming API for hashing stream-like objects such as InputStreams, Java NIO Channels, or Akka Streams. Hash functions should produce consistent output regardless of platform or endianness.
This library uses the `sun.misc.Unsafe` API internally. I might explore using the `VarHandle` API introduced in Java 9 in the future, but am currently still supporting Java 8.
## Performance
Benchmarked against various other open-source implementations
* [Guava](https://github.com/google/guava) (MurmurHash3)
* [LZ4 Java](https://github.com/lz4/lz4-java) (XxHash32 and XxHash64 - Includes JNI binding, pure Java, and Java+Unsafe implementations)
* [Scala](https://github.com/scala/scala) (Scala's built-in `scala.util.hashing.MurmurHash3`)
* [Zero-Allocation-Hashing](https://github.com/OpenHFT/Zero-Allocation-Hashing) (XxHash64)
### MurmurHash3_32

### XxHash32

### XxHash64

### Running Locally
Benchmarks are located in the `bench` subproject and can be run using the [sbt-jmh](https://github.com/ktoso/sbt-jmh) plugin.
To run all benchmarks with default settings
```sbt
bench/jmh:run
```
To run a specific benchmark with custom settings
```sbt
bench/jmh:run -f 2 -wi 5 -i 5 XxHash64Bench
```
## Getting Started
```scala
libraryDependencies += "com.desmondyeung.hashing" %% "scala-hashing" % "0.1.0"
```
### Examples
This library defines the interfaces `Hash32` and `StreamingHash32` for computing 32-bit hashes and `Hash64` and `StreamingHash64` for computing 64-bit hashes. Classes extending `StreamingHash32` or `StreamingHash64` are not thread-safe.
The public API for `Hash64` and `StreamingHash64` can be seen below
```scala
trait Hash64 {
def hashByte(input: Byte, seed: Long): Long
def hashInt(input: Int, seed: Long): Long
def hashLong(input: Long, seed: Long): Long
def hashByteArray(input: Array[Byte], seed: Long): Long
def hashByteArray(input: Array[Byte], offset: Int, length: Int, seed: Long): Long
def hashByteBuffer(input: ByteBuffer, seed: Long): Long
def hashByteBuffer(input: ByteBuffer, offset: Int, length: Int, seed: Long): Long
}
trait StreamingHash64 {
def reset(): Unit
def value: Long
def updateByteArray(input: Array[Byte], offset: Int, length: Int): Unit
def updateByteBuffer(input: ByteBuffer, offset: Int, length: Int): Unit
}
```
Using the standard API
```scala
import com.desmondyeung.hashing.XxHash64
import java.nio.ByteBuffer
// hash a long
val hash = XxHash64.hashLong(123, seed = 0)
// hash a Array[Byte]
val hash = XxHash64.hashByteArray(Array[Byte](123), seed = 0)
// hash a ByteBuffer
val hash = XxHash64.hashByteBuffer(ByteBuffer.wrap(Array[Byte](123)), seed = 0)
```
Using the streaming API
```scala
import com.desmondyeung.hashing.StreamingXxHash64
import java.nio.ByteBuffer
import java.io.FileInputStream
val checksum = StreamingXxHash64(seed = 0)
val channel = new FileInputStream("/path/to/file.txt").getChannel
val chunk = ByteBuffer.allocate(1024)
var bytesRead = channel.read(chunk)
while (bytesRead > 0) {
checksum.updateByteBuffer(chunk, 0, bytesRead)
chunk.rewind
bytesRead = channel.read(chunk)
}
val hash = checksum.value
```
## License
Licensed under the Apache License, Version 2.0 (the "License").