An open API service indexing awesome lists of open source software.

https://github.com/hajimes/mmh3

Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.
https://github.com/hajimes/mmh3

cpython hash murmurhash murmurhash3 python

Last synced: 28 days ago
JSON representation

Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.

Awesome Lists containing this project

README

          

# mmh3

[![Documentation Status](https://readthedocs.org/projects/mmh3/badge/?version=stable)](https://mmh3.readthedocs.io/en/stable/)
[![GitHub Super-Linter](https://github.com/hajimes/mmh3/actions/workflows/superlinter.yml/badge.svg?branch=master)](https://github.com/hajimes/mmh3/actions?query=workflow%3ASuper-Linter+branch%3Amaster)
[![Build](https://github.com/hajimes/mmh3/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/hajimes/mmh3/actions/workflows/build.yml?branch=master)
[![PyPi Version](https://img.shields.io/pypi/v/mmh3.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/mmh3/)
[![Python Versions](https://img.shields.io/pypi/pyversions/mmh3.svg)](https://pypi.org/project/mmh3/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/license/mit/)
[![Total Downloads](https://static.pepy.tech/badge/mmh3)](https://pepy.tech/projects/mmh3?versions=*%2C5.*%2C4.*%2C3.*%2C2.*)
[![Recent Downloads](https://static.pepy.tech/badge/mmh3/month)](https://pepy.tech/projects/mmh3?versions=*%2C5.*%2C4.*%2C3.*%2C2.*)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.06124/status.svg)](https://doi.org/10.21105/joss.06124)

`mmh3` is a Python extension for
[MurmurHash (MurmurHash3)](https://en.wikipedia.org/wiki/MurmurHash), a set of
fast and robust non-cryptographic hash functions invented by Austin Appleby.

By combining `mmh3` with probabilistic techniques like
[Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter),
[MinHash](https://en.wikipedia.org/wiki/MinHash), and
[feature hashing](https://en.wikipedia.org/wiki/Feature_hashing), you can
develop high-performance systems in fields such as data mining, machine
learning, and natural language processing.

Another popular use of `mmh3` is to
[calculate favicon hashes](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a),
which are utilized by [Shodan](https://www.shodan.io), the world's first IoT
search engine.

This page provides a quick start guide. For more comprehensive information,
please refer to the [documentation](https://mmh3.readthedocs.io/en/stable/).

## Installation

```shell
pip install mmh3
```

## Usage

### Basic usage

```pycon
>>> import mmh3
>>> mmh3.hash(b"foo") # returns a 32-bit signed int
-156908512
>>> mmh3.hash("foo") # accepts str (UTF-8 encoded)
-156908512
>>> mmh3.hash(b"foo", 42) # uses 42 as the seed
-1322301282
>>> mmh3.hash(b"foo", 0, False) # returns a 32-bit unsigned int
4138058784
```

`mmh3.mmh3_x64_128_digest()`, introduced in version 5.0.0, efficienlty hashes
buffer objects that implement the buffer protocol
([PEP 688](https://peps.python.org/pep-0688/)) without internal memory copying.
The function returns a `bytes` object of 16 bytes (128 bits). It is
particularly suited for hashing large memory views, such as
`bytearray`, `memoryview`, and `numpy.ndarray`, and performs faster than
the 32-bit variants like `hash()` on 64-bit machines.

```pycon
>>> mmh3.mmh3_x64_128_digest(numpy.random.rand(100))
b'\x8c\xee\xc6z\xa9\xfeR\xe8o\x9a\x9b\x17u\xbe\xdc\xee'
```

Various alternatives are available, offering different return types (e.g.,
signed integers, tuples of unsigned integers) and optimized for different
architectures. For a comprehensive list of functions, refer to the
[API Reference](https://mmh3.readthedocs.io/en/stable/api.html).

### `hashlib`-style hashers

`mmh3` implements hasher objects with interfaces similar to those in `hashlib`
from the standard library, although they are still experimental. See
[Hasher Classes](https://mmh3.readthedocs.io/en/stable/api.html#hasher-classes)
in the API Reference for more information.

## Changelog

See [Changelog (latest version)](https://mmh3.readthedocs.io/en/latest/changelog.html)
for the complete changelog.

### [5.2.0] - 2025-07-29

#### Added

- Add support for Python 3.14, including 3.14t (no-GIL) wheels. However, thread
safety for the no-GIL variant is not fully tested yet. Please report any
issues you encounter ([#134](https://github.com/hajimes/mmh3/pull/134),
[#136](https://github.com/hajimes/mmh3/pull/136)).
- Add support for Android (Python 3.13 only) and iOS (Python 3.13 and 3.14) wheels,
enabled by the major version update of
[cibuildwheel](https://github.com/pypa/cibuildwheel)
([#135](https://github.com/hajimes/mmh3/pull/135)).

### [5.1.0] - 2025-01-25

#### Added

- Improve the performance of `hash128()`, `hash64()`, and `hash_bytes()` by
using
[METH_FASTCALL](https://docs.python.org/3/c-api/structures.html#c.METH_FASTCALL),
reducing the overhead of function calls
([#116](https://github.com/hajimes/mmh3/pull/116)).
- Add the software paper for this library
([doi:10.21105/joss.06124](https://doi.org/10.21105/joss.06124)), following
its publication in the
[_Journal of Open Source Software_](https://joss.theoj.org)
([#118](https://github.com/hajimes/mmh3/pull/118)).

#### Removed

- Drop support for Python 3.8, as it has reached the end of life on 2024-10-07
([#117](https://github.com/hajimes/mmh3/pull/117)).

### [5.0.1] - 2024-09-22

#### Fixed

- Fix the issue that the package cannot be built from the source distribution
([#90](https://github.com/hajimes/mmh3/issues/90)).

## License

[MIT](https://github.com/hajimes/mmh3/blob/master/LICENSE), unless otherwise
noted within a file.

## Frequently Asked Questions

### Different results from other MurmurHash3-based libraries

By default, `mmh3` returns **signed** values for the 32-bit and 64-bit versions
and **unsigned** values for `hash128` due to historical reasons. To get the
desired result, use the `signed` keyword argument.

Starting from version 4.0.0, **`mmh3` is endian-neutral**, meaning that its
hash functions return the same values on big-endian platforms as they do on
little-endian ones. In contrast, the original C++ library by Appleby is
endian-sensitive. If you need results that comply with the original library on
big-endian systems, please use version 3.\*.

For compatibility with [Google Guava (Java)](https://github.com/google/guava),
see
.

For compatibility with
[murmur3 (Go)](https://pkg.go.dev/github.com/spaolacci/murmur3), see
.

### Handling errors with negative seeds

From the version 5.0.0, `mmh3` functions accept only **unsigned** 32-bit integer
seeds to enable faster type-checking and conversion. However, this change may
cause issues if you need to calculate hash values using negative seeds within
the range of signed 32-bit integers. For instance,
[Telegram-iOS](https://github.com/TelegramMessenger/Telegram-iOS) uses
`-137723950` as a hard-coded seed (bitwise equivalent to `4157243346`). To
handle such cases, you can convert a signed 32-bit integer to its unsigned
equivalent by applying a bitwise AND operation with `0xffffffff`. Here's an
example:

```pycon
>>> mmh3.hash(b"quux", 4294967295)
258499980
>>> d = -1
>>> mmh3.hash(b"quux", d & 0xffffffff)
258499980
```

Alternatively, if the seed is hard-coded (as in the Telegram-iOS case), you can
precompute the unsigned value for simplicity.

## Contributing Guidelines

See [Contributing](https://mmh3.readthedocs.io/en/stable/CONTRIBUTING.html).

## Authors

MurmurHash3 was originally developed by Austin Appleby and distributed under
public domain
[https://github.com/aappleby/smhasher](https://github.com/aappleby/smhasher).

Ported and modified for Python by Hajime Senuma.

## External Tutorials

### High-performance computing

The following textbooks and tutorials are great resources for learning how to
use `mmh3` (and other hash algorithms in general) for high-performance computing.

- Chapter 11: _Using Less Ram_ in Micha Gorelick and Ian Ozsvald. 2014. _High
Performance Python: Practical Performant Programming for Humans_. O'Reilly
Media. [ISBN: 978-1-4493-6159-4](https://www.amazon.com/dp/1449361595).
- 2nd edition of the above (2020).
[ISBN: 978-1492055020](https://www.amazon.com/dp/1492055026).
- Max Burstein. February 2, 2013.
_[Creating a Simple Bloom Filter](http://www.maxburstein.com/blog/creating-a-simple-bloom-filter/)_.
- Duke University. April 14, 2016.
_[Efficient storage of data in memory](http://people.duke.edu/~ccc14/sta-663-2016/20B_Big_Data_Structures.html)_.
- Bugra Akyildiz. August 24, 2016.
_[A Gentle Introduction to Bloom Filter](https://www.kdnuggets.com/2016/08/gentle-introduction-bloom-filter.html)_.
KDnuggets.

### Internet of things

[Shodan](https://www.shodan.io), the world's first
[IoT](https://en.wikipedia.org/wiki/Internet_of_things) search engine, uses
MurmurHash3 hash values for [favicons](https://en.wikipedia.org/wiki/Favicon)
(icons associated with web pages). [ZoomEye](https://www.zoomeye.org) follows
Shodan's convention.
[Calculating these values with mmh3](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a)
is useful for OSINT and cybersecurity activities.

- Jan Kopriva. April 19, 2021.
_[Hunting phishing websites with favicon hashes](https://isc.sans.edu/diary/Hunting+phishing+websites+with+favicon+hashes/27326)_.
SANS Internet Storm Center.
- Nikhil Panwar. May 2, 2022.
_[Using Favicons to Discover Phishing & Brand Impersonation Websites](https://bolster.ai/blog/how-to-use-favicons-to-find-phishing-websites)_.
Bolster.
- Faradaysec. July 25, 2022.
_[Understanding Spring4Shell: How used is it?](https://faradaysec.com/understanding-spring4shell/)_.
Faraday Security.
- Debjeet. August 2, 2022.
_[How To Find Assets Using Favicon Hashes](https://payatu.com/blog/favicon-hash/)_.
Payatu.

## How to Cite This Library

If you use this library in your research, it would be appreciated if you could
cite the following paper published in the
[_Journal of Open Source Software_](https://joss.theoj.org):

Hajime Senuma. 2025.
[mmh3: A Python extension for MurmurHash3](https://doi.org/10.21105/joss.06124).
_Journal of Open Source Software_, 10(105):6124.

In BibTeX format:

```tex
@article{senumaMmh3PythonExtension2025,
title = {{mmh3}: A {Python} extension for {MurmurHash3}},
author = {Senuma, Hajime},
year = {2025},
month = jan,
journal = {Journal of Open Source Software},
volume = {10},
number = {105},
pages = {6124},
issn = {2475-9066},
doi = {10.21105/joss.06124},
copyright = {http://creativecommons.org/licenses/by/4.0/}
}
```

## Related Libraries

- : mmh3 in pure python (Fredrik Kihlander
and Swapnil Gusani)
- : Python bindings for CityHash
(Eugene Scherba)
- : Python bindings for FarmHash
(Veelion Chong)
- : Python bindings for MetroHash
(Eugene Scherba)
- : Python bindings for xxHash (Yue
Du)

[5.2.0]: https://github.com/hajimes/mmh3/compare/v5.1.0...v5.2.0
[5.1.0]: https://github.com/hajimes/mmh3/compare/v5.0.1...v5.1.0
[5.0.1]: https://github.com/hajimes/mmh3/compare/v5.0.0...v5.0.1