https://github.com/sslotin/amh-code
Complete implementations from "Algorithms for Modern Hardware"
https://github.com/sslotin/amh-code
algorithms computer-science hpc performance
Last synced: 3 months ago
JSON representation
Complete implementations from "Algorithms for Modern Hardware"
- Host: GitHub
- URL: https://github.com/sslotin/amh-code
- Owner: sslotin
- Created: 2020-12-10T11:20:06.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-12-11T14:30:07.000Z (over 2 years ago)
- Last Synced: 2025-03-28T16:07:07.174Z (3 months ago)
- Topics: algorithms, computer-science, hpc, performance
- Language: Jupyter Notebook
- Homepage: https://en.algorithmica.org/hpc
- Size: 8.86 MB
- Stars: 738
- Watchers: 28
- Forks: 46
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Algorithms for Modern Hardware
This repository contains full examples and other associated code from https://en.algorithmica.org/hpc
The book is still unfinished, and my writing process is very slow and non-sequential — sometimes the "idea → code → benchmarks → article" pipeline may take 6 months or even more — so in this repository you can get a preview on a lot of interesting things that I haven't yet properly written up and published.
Things that have improved on the state-of-the-art:
- Many variants of [binary search](https://github.com/sslotin/amh-code/tree/main/binsearch), the [fastest one](https://github.com/sslotin/amh-code/blob/main/binsearch/bplus.cc) achieving ~15x speedup over `std::lower_bound` for small arrays (that fit in cache) and ~8x speedup for large arrays (>1e6).
- [Argmin at the speed of memory](https://github.com/sslotin/amh-code/blob/main/argmin/simdmin.cc).
- Implementation of [the Floyd-Warshall algorithm](https://github.com/sslotin/amh-code/tree/main/floyd) that is about 50x faster than the naive "for-for-for" algorithm.Things that match current state-of-the-art:
- [A version of a segment tree](https://github.com/sslotin/amh-code/blob/main/segtree/refactor2.cc) that can compute prefix sums in ~2ns plus the time of the slowest memory read.
- (✓ published) [An implementation of GCD](https://en.algorithmica.org/hpc/analyzing-performance/gcd/) that works 2-3 faster than `std::gcd`.
- [Integer factorization](https://github.com/sslotin/amh-code/blob/main/factor/montgomery.cc) taking ~0.5ms per 60-bit integer.
- An algorithm for parsing series of integers ~2x faster than `scanf("%d")` does.
- An implementation of [BLAS-level matrix multiplication](https://github.com/sslotin/amh-code/blob/main/matmul/v6.cc) that can be expressed in [about 30 lines of C](https://gist.github.com/sslotin/fae39ea49a812732ae45db7b72f6a7ff).
- Various efficient [hash tables](https://github.com/sslotin/amh-code/tree/main/hash-tables).
- Efficient [FFT](https://github.com/sslotin/amh-code/tree/main/fft) and Karatsuba algorithm implementations.Various benchmarks:
- Benchmarks for [branching and predication](https://github.com/sslotin/amh-code/tree/main/branching).
- Benchmarks for [RAM and CPU cache system](https://github.com/sslotin/amh-code/tree/main/cpu-cache).At the implementation stage:
- Ordered Trees (apply the same technique as with binary searching, but with dynamically-allocated B-tree nodes)
- Range minimum queries (both static and dynamic)
- Filters (Bloom, cuckoo, xor, theoretical minimum)
- Dot product / logistic regression (newton's method, SIMD, quantization)
- Prime number sieves (blocking plus wheel)
- Sorting (speeding up quicksort and mergesort with SIMD and radix sort)
- Writing series of integers (SIMD + fast mod-10)
- Bitmaps (blocking, SIMD)At the idea stage:
- String searching (SIMD-based strstr and rolling hashing)
- Using SIMD to speed up Pollard's algorithm (naive sqrt-parallelization)
- SIMD-based random number generation and hashing