https://github.com/hajimes/mmh3
Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.
https://github.com/hajimes/mmh3
cpython hash murmurhash murmurhash3 python
Last synced: 28 days ago
JSON representation
Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.
- Host: GitHub
- URL: https://github.com/hajimes/mmh3
- Owner: hajimes
- License: mit
- Created: 2013-02-10T15:48:12.000Z (almost 13 years ago)
- Default Branch: master
- Last Pushed: 2025-10-06T11:56:12.000Z (about 1 month ago)
- Last Synced: 2025-10-17T20:27:43.536Z (about 1 month ago)
- Topics: cpython, hash, murmurhash, murmurhash3, python
- Language: C
- Homepage: https://pypi.org/project/mmh3/
- Size: 3.45 MB
- Stars: 351
- Watchers: 7
- Forks: 73
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: docs/CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-python-data-science - mmh3 - MurmurHash3, a set of fast and robust hash functions. (Misc / Ranking/Recommender)
README
# mmh3
[](https://mmh3.readthedocs.io/en/stable/)
[](https://github.com/hajimes/mmh3/actions?query=workflow%3ASuper-Linter+branch%3Amaster)
[](https://github.com/hajimes/mmh3/actions/workflows/build.yml?branch=master)
[](https://pypi.org/project/mmh3/)
[](https://pypi.org/project/mmh3/)
[](https://opensource.org/license/mit/)
[](https://pepy.tech/projects/mmh3?versions=*%2C5.*%2C4.*%2C3.*%2C2.*)
[](https://pepy.tech/projects/mmh3?versions=*%2C5.*%2C4.*%2C3.*%2C2.*)
[](https://doi.org/10.21105/joss.06124)
`mmh3` is a Python extension for
[MurmurHash (MurmurHash3)](https://en.wikipedia.org/wiki/MurmurHash), a set of
fast and robust non-cryptographic hash functions invented by Austin Appleby.
By combining `mmh3` with probabilistic techniques like
[Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter),
[MinHash](https://en.wikipedia.org/wiki/MinHash), and
[feature hashing](https://en.wikipedia.org/wiki/Feature_hashing), you can
develop high-performance systems in fields such as data mining, machine
learning, and natural language processing.
Another popular use of `mmh3` is to
[calculate favicon hashes](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a),
which are utilized by [Shodan](https://www.shodan.io), the world's first IoT
search engine.
This page provides a quick start guide. For more comprehensive information,
please refer to the [documentation](https://mmh3.readthedocs.io/en/stable/).
## Installation
```shell
pip install mmh3
```
## Usage
### Basic usage
```pycon
>>> import mmh3
>>> mmh3.hash(b"foo") # returns a 32-bit signed int
-156908512
>>> mmh3.hash("foo") # accepts str (UTF-8 encoded)
-156908512
>>> mmh3.hash(b"foo", 42) # uses 42 as the seed
-1322301282
>>> mmh3.hash(b"foo", 0, False) # returns a 32-bit unsigned int
4138058784
```
`mmh3.mmh3_x64_128_digest()`, introduced in version 5.0.0, efficienlty hashes
buffer objects that implement the buffer protocol
([PEP 688](https://peps.python.org/pep-0688/)) without internal memory copying.
The function returns a `bytes` object of 16 bytes (128 bits). It is
particularly suited for hashing large memory views, such as
`bytearray`, `memoryview`, and `numpy.ndarray`, and performs faster than
the 32-bit variants like `hash()` on 64-bit machines.
```pycon
>>> mmh3.mmh3_x64_128_digest(numpy.random.rand(100))
b'\x8c\xee\xc6z\xa9\xfeR\xe8o\x9a\x9b\x17u\xbe\xdc\xee'
```
Various alternatives are available, offering different return types (e.g.,
signed integers, tuples of unsigned integers) and optimized for different
architectures. For a comprehensive list of functions, refer to the
[API Reference](https://mmh3.readthedocs.io/en/stable/api.html).
### `hashlib`-style hashers
`mmh3` implements hasher objects with interfaces similar to those in `hashlib`
from the standard library, although they are still experimental. See
[Hasher Classes](https://mmh3.readthedocs.io/en/stable/api.html#hasher-classes)
in the API Reference for more information.
## Changelog
See [Changelog (latest version)](https://mmh3.readthedocs.io/en/latest/changelog.html)
for the complete changelog.
### [5.2.0] - 2025-07-29
#### Added
- Add support for Python 3.14, including 3.14t (no-GIL) wheels. However, thread
safety for the no-GIL variant is not fully tested yet. Please report any
issues you encounter ([#134](https://github.com/hajimes/mmh3/pull/134),
[#136](https://github.com/hajimes/mmh3/pull/136)).
- Add support for Android (Python 3.13 only) and iOS (Python 3.13 and 3.14) wheels,
enabled by the major version update of
[cibuildwheel](https://github.com/pypa/cibuildwheel)
([#135](https://github.com/hajimes/mmh3/pull/135)).
### [5.1.0] - 2025-01-25
#### Added
- Improve the performance of `hash128()`, `hash64()`, and `hash_bytes()` by
using
[METH_FASTCALL](https://docs.python.org/3/c-api/structures.html#c.METH_FASTCALL),
reducing the overhead of function calls
([#116](https://github.com/hajimes/mmh3/pull/116)).
- Add the software paper for this library
([doi:10.21105/joss.06124](https://doi.org/10.21105/joss.06124)), following
its publication in the
[_Journal of Open Source Software_](https://joss.theoj.org)
([#118](https://github.com/hajimes/mmh3/pull/118)).
#### Removed
- Drop support for Python 3.8, as it has reached the end of life on 2024-10-07
([#117](https://github.com/hajimes/mmh3/pull/117)).
### [5.0.1] - 2024-09-22
#### Fixed
- Fix the issue that the package cannot be built from the source distribution
([#90](https://github.com/hajimes/mmh3/issues/90)).
## License
[MIT](https://github.com/hajimes/mmh3/blob/master/LICENSE), unless otherwise
noted within a file.
## Frequently Asked Questions
### Different results from other MurmurHash3-based libraries
By default, `mmh3` returns **signed** values for the 32-bit and 64-bit versions
and **unsigned** values for `hash128` due to historical reasons. To get the
desired result, use the `signed` keyword argument.
Starting from version 4.0.0, **`mmh3` is endian-neutral**, meaning that its
hash functions return the same values on big-endian platforms as they do on
little-endian ones. In contrast, the original C++ library by Appleby is
endian-sensitive. If you need results that comply with the original library on
big-endian systems, please use version 3.\*.
For compatibility with [Google Guava (Java)](https://github.com/google/guava),
see
.
For compatibility with
[murmur3 (Go)](https://pkg.go.dev/github.com/spaolacci/murmur3), see
.
### Handling errors with negative seeds
From the version 5.0.0, `mmh3` functions accept only **unsigned** 32-bit integer
seeds to enable faster type-checking and conversion. However, this change may
cause issues if you need to calculate hash values using negative seeds within
the range of signed 32-bit integers. For instance,
[Telegram-iOS](https://github.com/TelegramMessenger/Telegram-iOS) uses
`-137723950` as a hard-coded seed (bitwise equivalent to `4157243346`). To
handle such cases, you can convert a signed 32-bit integer to its unsigned
equivalent by applying a bitwise AND operation with `0xffffffff`. Here's an
example:
```pycon
>>> mmh3.hash(b"quux", 4294967295)
258499980
>>> d = -1
>>> mmh3.hash(b"quux", d & 0xffffffff)
258499980
```
Alternatively, if the seed is hard-coded (as in the Telegram-iOS case), you can
precompute the unsigned value for simplicity.
## Contributing Guidelines
See [Contributing](https://mmh3.readthedocs.io/en/stable/CONTRIBUTING.html).
## Authors
MurmurHash3 was originally developed by Austin Appleby and distributed under
public domain
[https://github.com/aappleby/smhasher](https://github.com/aappleby/smhasher).
Ported and modified for Python by Hajime Senuma.
## External Tutorials
### High-performance computing
The following textbooks and tutorials are great resources for learning how to
use `mmh3` (and other hash algorithms in general) for high-performance computing.
- Chapter 11: _Using Less Ram_ in Micha Gorelick and Ian Ozsvald. 2014. _High
Performance Python: Practical Performant Programming for Humans_. O'Reilly
Media. [ISBN: 978-1-4493-6159-4](https://www.amazon.com/dp/1449361595).
- 2nd edition of the above (2020).
[ISBN: 978-1492055020](https://www.amazon.com/dp/1492055026).
- Max Burstein. February 2, 2013.
_[Creating a Simple Bloom Filter](http://www.maxburstein.com/blog/creating-a-simple-bloom-filter/)_.
- Duke University. April 14, 2016.
_[Efficient storage of data in memory](http://people.duke.edu/~ccc14/sta-663-2016/20B_Big_Data_Structures.html)_.
- Bugra Akyildiz. August 24, 2016.
_[A Gentle Introduction to Bloom Filter](https://www.kdnuggets.com/2016/08/gentle-introduction-bloom-filter.html)_.
KDnuggets.
### Internet of things
[Shodan](https://www.shodan.io), the world's first
[IoT](https://en.wikipedia.org/wiki/Internet_of_things) search engine, uses
MurmurHash3 hash values for [favicons](https://en.wikipedia.org/wiki/Favicon)
(icons associated with web pages). [ZoomEye](https://www.zoomeye.org) follows
Shodan's convention.
[Calculating these values with mmh3](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a)
is useful for OSINT and cybersecurity activities.
- Jan Kopriva. April 19, 2021.
_[Hunting phishing websites with favicon hashes](https://isc.sans.edu/diary/Hunting+phishing+websites+with+favicon+hashes/27326)_.
SANS Internet Storm Center.
- Nikhil Panwar. May 2, 2022.
_[Using Favicons to Discover Phishing & Brand Impersonation Websites](https://bolster.ai/blog/how-to-use-favicons-to-find-phishing-websites)_.
Bolster.
- Faradaysec. July 25, 2022.
_[Understanding Spring4Shell: How used is it?](https://faradaysec.com/understanding-spring4shell/)_.
Faraday Security.
- Debjeet. August 2, 2022.
_[How To Find Assets Using Favicon Hashes](https://payatu.com/blog/favicon-hash/)_.
Payatu.
## How to Cite This Library
If you use this library in your research, it would be appreciated if you could
cite the following paper published in the
[_Journal of Open Source Software_](https://joss.theoj.org):
Hajime Senuma. 2025.
[mmh3: A Python extension for MurmurHash3](https://doi.org/10.21105/joss.06124).
_Journal of Open Source Software_, 10(105):6124.
In BibTeX format:
```tex
@article{senumaMmh3PythonExtension2025,
title = {{mmh3}: A {Python} extension for {MurmurHash3}},
author = {Senuma, Hajime},
year = {2025},
month = jan,
journal = {Journal of Open Source Software},
volume = {10},
number = {105},
pages = {6124},
issn = {2475-9066},
doi = {10.21105/joss.06124},
copyright = {http://creativecommons.org/licenses/by/4.0/}
}
```
## Related Libraries
- : mmh3 in pure python (Fredrik Kihlander
and Swapnil Gusani)
- : Python bindings for CityHash
(Eugene Scherba)
- : Python bindings for FarmHash
(Veelion Chong)
- : Python bindings for MetroHash
(Eugene Scherba)
- : Python bindings for xxHash (Yue
Du)
[5.2.0]: https://github.com/hajimes/mmh3/compare/v5.1.0...v5.2.0
[5.1.0]: https://github.com/hajimes/mmh3/compare/v5.0.1...v5.1.0
[5.0.1]: https://github.com/hajimes/mmh3/compare/v5.0.0...v5.0.1