https://github.com/explosion/preshed
💥 Cython hash tables that assume keys are pre-hashed
https://github.com/explosion/preshed
cython hash-table hash-tables hashing python
Last synced: 24 days ago
JSON representation
💥 Cython hash tables that assume keys are pre-hashed
- Host: GitHub
- URL: https://github.com/explosion/preshed
- Owner: explosion
- License: mit
- Created: 2014-09-26T16:25:02.000Z (over 11 years ago)
- Default Branch: v3.0.x
- Last Pushed: 2026-03-23T08:45:19.000Z (about 1 month ago)
- Last Synced: 2026-03-24T05:59:55.612Z (about 1 month ago)
- Topics: cython, hash-table, hash-tables, hashing, python
- Language: Cython
- Homepage:
- Size: 193 KB
- Stars: 87
- Watchers: 9
- Forks: 24
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# preshed: Cython Hash Table for Pre-Hashed Keys
Simple but high performance Cython hash table mapping pre-randomized keys to
`void*` values. Inspired by
[Jeff Preshing](http://preshing.com/20130107/this-hash-table-is-faster-than-a-judy-array/).
All Python APIs provded by the `BloomFilter` and `PreshMap` classes are
thread-safe on both the GIL-enabled build and the free-threaded build of Python
3.14 and newer. If you use the C API or the `PreshCounter` class, you must
provide external synchronization if you use the data structures by this library
in a multithreaded environment.
[](https://github.com/explosion/preshed/actions/workflows/tests.yml)
[](https://pypi.python.org/pypi/preshed)
[](https://anaconda.org/conda-forge/preshed)
[](https://github.com/explosion/wheelwright/releases)
## Installation
```bash
pip install preshed --only-binary preshed
```
Or with conda:
```bash
conda install -c conda-forge preshed
```
## Usage
### PreshMap
A hash map for pre-hashed keys, mapping `uint64` to `uint64` values.
```python
from preshed.maps import PreshMap
map = PreshMap() # create with default size
map = PreshMap(initial_size=1024) # create with initial capacity (must be power of 2)
map[key] = value # set a value
value = map[key] # get a value (returns None if missing)
value = map.pop(key) # remove and return a value
del map[key] # delete a key
key in map # membership test
len(map) # number of entries
for key in map: # iterate over keys
pass
for key, value in map.items(): # iterate over key-value pairs
pass
for value in map.values(): # iterate over values
pass
```
### BloomFilter
A probabilistic set for fast membership testing of integer keys.
```python
from preshed.bloom import BloomFilter
bloom = BloomFilter(size=1024, hash_funcs=23) # explicit parameters
bloom = BloomFilter.from_error_rate(10000, error_rate=1e-4) # auto-sized
bloom.add(42) # add a key
42 in bloom # membership test (may have false positives)
data = bloom.to_bytes() # serialize
bloom.from_bytes(data) # deserialize in-place
```
### PreshCounter
A counter backed by a hash map, for counting occurrences of `uint64` keys.
```python
from preshed.counter import PreshCounter
counter = PreshCounter()
counter.inc(key, 1) # increment key by 1
count = counter[key] # get current count
len(counter) # number of buckets
for key, count in counter: # iterate over entries
pass
counter.smooth() # apply Good-Turing smoothing
prob = counter.prob(key) # get smoothed probability
```
### Cython API
All classes expose a C-level API via `.pxd` files for use in Cython
extensions. The low-level `MapStruct` and `BloomStruct` functions operate
on raw structs and can be called without the GIL:
```cython
from preshed.maps cimport PreshMap, map_get, map_set, map_iter, key_t
from preshed.bloom cimport BloomFilter, bloom_add, bloom_contains
cdef PreshMap table = PreshMap()
# Low-level nogil access (requires external synchronization)
cdef void* value
with nogil:
value = map_get(table.c_map, some_key)
```