Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bwesterb/go-and
Fast bitwise operations on byte-slices
https://github.com/bwesterb/go-and
avx2 bitwise-and bitwise-andnot bitwise-or bitwise-xor neon popcnt
Last synced: 2 months ago
JSON representation
Fast bitwise operations on byte-slices
- Host: GitHub
- URL: https://github.com/bwesterb/go-and
- Owner: bwesterb
- License: mit
- Created: 2024-07-09T21:54:02.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-28T13:44:18.000Z (4 months ago)
- Last Synced: 2024-08-28T15:09:52.660Z (4 months ago)
- Topics: avx2, bitwise-and, bitwise-andnot, bitwise-or, bitwise-xor, neon, popcnt
- Language: Go
- Homepage:
- Size: 56.6 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
go-and
======[![Go Reference](https://pkg.go.dev/badge/github.com/bwesterb/go-and.svg)](https://pkg.go.dev/github.com/bwesterb/go-and)
Fast bitwise and, or, xor, andn, popcount and memset for `[]byte` slices.
```go
import "github.com/bwesterb/go-and"func main() {
var a, b, dst []byte// ...
// Sets dst to the bitwise and of a and b
and.And(dst, a, b)
}
```Makes use of AVX2 on AMD64 and NEON on ARM64.
## Benchmarks
Created using `./benchmark.sh`.
This shows three benchmarks:
* `naive` is a simple for loop doing one byte at a time.
* `purego` are our slightly optimized versions that work on uint64s instead of bytes.
* `asm` are the AVX2 implementations and the reason to use this library.### Intel Core i9-13900
```
goos: linux
goarch: amd64
pkg: github.com/bwesterb/go-and
cpu: 13th Gen Intel(R) Core(TM) i9-13900
│ naive │ purego │ asm │
│ sec/op │ sec/op vs base │ sec/op vs base │
And-32 8162.0n ± 5% 2034.5n ± 1% -75.07% (p=0.000 n=10) 518.4n ± 1% -93.65% (p=0.000 n=10)
Or-32 9751.5n ± 8% 2104.5n ± 3% -78.42% (p=0.000 n=10) 515.0n ± 1% -94.72% (p=0.000 n=10)
Xor-32 8112.5n ± 3% 2029.0n ± 0% -74.99% (p=0.000 n=10) 518.6n ± 1% -93.61% (p=0.000 n=10)
AndNot-32 10685.5n ± 4% 2292.0n ± 2% -78.55% (p=0.000 n=10) 517.8n ± 1% -95.15% (p=0.000 n=10)
Memset-32 167.96µ ± 0% 57.54µ ± 1% -65.74% (p=0.000 n=10) 15.65µ ± 1% -90.68% (p=0.000 n=10)
Popcnt-32 132.15µ ± 1% 71.63µ ± 1% -45.80% (p=0.000 n=10) 36.51µ ± 2% -72.37% (p=0.000 n=10)
geomean 23.13µ 6.592µ -71.50% 1.857µ -91.97%│ naive │ purego │ asm │
│ B/s │ B/s vs base │ B/s vs base │
And-32 3.651Gi ± 5% 14.649Gi ± 1% +301.20% (p=0.000 n=10) 57.488Gi ± 1% +1474.44% (p=0.000 n=10)
Or-32 3.057Gi ± 8% 14.163Gi ± 3% +363.37% (p=0.000 n=10) 57.872Gi ± 1% +1793.33% (p=0.000 n=10)
Xor-32 3.674Gi ± 3% 14.690Gi ± 0% +299.88% (p=0.000 n=10) 57.469Gi ± 1% +1464.38% (p=0.000 n=10)
AndNot-32 2.789Gi ± 4% 13.003Gi ± 2% +366.21% (p=0.000 n=10) 57.558Gi ± 1% +1963.74% (p=0.000 n=10)
Memset-32 5.545Gi ± 0% 16.187Gi ± 1% +191.91% (p=0.000 n=10) 59.504Gi ± 1% +973.11% (p=0.000 n=10)
Popcnt-32 7.048Gi ± 1% 13.002Gi ± 1% +84.48% (p=0.000 n=10) 25.507Gi ± 2% +261.92% (p=0.000 n=10)
geomean 4.058Gi 14.24Gi +250.89% 50.56Gi +1145.76%
```### Apple M2 Pro
```
goos: darwin
goarch: arm64
pkg: github.com/bwesterb/go-and
│ naive │ purego │ asm │
│ sec/op │ sec/op vs base │ sec/op vs base │
And-12 9891.5n ± 0% 2248.0n ± 1% -77.27% (p=0.000 n=10) 511.2n ± 1% -94.83% (p=0.000 n=10)
Or-12 9824.5n ± 1% 2267.5n ± 1% -76.92% (p=0.000 n=10) 511.2n ± 1% -94.80% (p=0.000 n=10)
Xor-12 9826.5n ± 0% 2232.0n ± 1% -77.29% (p=0.000 n=10) 509.8n ± 1% -94.81% (p=0.000 n=10)
AndNot-12 9928.5n ± 1% 2251.0n ± 1% -77.33% (p=0.000 n=10) 581.7n ± 0% -94.14% (p=0.000 n=10)
Memset-12 294.96µ ± 1% 79.18µ ± 2% -73.15% (p=0.000 n=10) 10.83µ ± 7% -96.33% (p=0.000 n=10)
Popcnt-12 288.90µ ± 1% 45.24µ ± 0% -84.34% (p=0.000 n=10) 11.30µ ± 0% -96.09% (p=0.000 n=10)
geomean 30.52µ 6.716µ -77.99% 1.455µ -95.23%│ naive │ purego │ asm │
│ B/s │ B/s vs base │ B/s vs base │
And-12 3.013Gi ± 0% 13.256Gi ± 1% +339.97% (p=0.000 n=10) 58.297Gi ± 1% +1834.87% (p=0.000 n=10)
Or-12 3.033Gi ± 1% 13.144Gi ± 1% +333.30% (p=0.000 n=10) 58.303Gi ± 1% +1821.98% (p=0.000 n=10)
Xor-12 3.033Gi ± 0% 13.353Gi ± 1% +340.28% (p=0.000 n=10) 58.462Gi ± 1% +1827.64% (p=0.000 n=10)
AndNot-12 3.002Gi ± 1% 13.240Gi ± 1% +341.07% (p=0.000 n=10) 51.233Gi ± 0% +1606.77% (p=0.000 n=10)
Memset-12 3.157Gi ± 1% 11.761Gi ± 2% +272.50% (p=0.000 n=10) 86.041Gi ± 7% +2625.05% (p=0.000 n=10)
Popcnt-12 3.224Gi ± 1% 20.585Gi ± 0% +538.57% (p=0.000 n=10) 82.407Gi ± 0% +2456.30% (p=0.000 n=10)
geomean 3.076Gi 13.98Gi +354.43% 64.53Gi +1997.80%
```