{"id":19041039,"url":"https://github.com/everythingme/inbloom","last_synced_at":"2025-04-05T11:10:52.797Z","repository":{"id":45315029,"uuid":"39391692","full_name":"EverythingMe/inbloom","owner":"EverythingMe","description":"Cross language bloom filter implementation","archived":false,"fork":false,"pushed_at":"2022-07-21T06:22:28.000Z","size":132,"stargazers_count":297,"open_issues_count":6,"forks_count":29,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-03-29T10:07:30.281Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EverythingMe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-07-20T15:22:39.000Z","updated_at":"2025-03-06T15:59:52.000Z","dependencies_parsed_at":"2022-08-23T17:40:08.995Z","dependency_job_id":null,"html_url":"https://github.com/EverythingMe/inbloom","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EverythingMe%2Finbloom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EverythingMe%2Finbloom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EverythingMe%2Finbloom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EverythingMe%2Finbloom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EverythingMe","download_url":"https://codeload.github.com/EverythingMe/inbloom/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247325693,"owners_count":20920714,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T22:26:36.932Z","updated_at":"2025-04-05T11:10:52.752Z","avatar_url":"https://github.com/EverythingMe.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"## inbloom\n\n_inbloom_ - a cross language Bloom filter implementation (https://en.wikipedia.org/wiki/Bloom_filter).\n\n![inbloom](https://raw.githubusercontent.com/EverythingMe/inbloom/master/inbloom.png)\n\n## What's a Bloom Filter?\nA Bloom filter is a probabalistic data structure which provides an extremely space-efficient method of representing large sets.\nIt can have false positives but never false negatives which means a query returns either \"possibly in set\" or \"definitely not in set\".\n\nYou can tune a Bloom filter to the desired error rate, it's basically a tradeoff between size and accuracy (See example: http://hur.st/bloomfilter). For example, a filter for about 100 keys with 1% error rate can be compressed to just 120 bytes. With 5% error rate it can be compressed to 78 bytes.\n\n## Why Cross Language?\nAt EverythingMe we have an Android client written in Java, and servers written mostly in Python and Go. When we wanted to pass filters from the client to the server to avoid saving some state on the server side, we needed an efficient implementation that can read and write Bloom Filters in all three languages at least, and found none.\n\nHaving such a library allows us to send filters between clients and any server component easily.\n\nSo we decided to build on top of an existing simple implementation in C called libbloom (https://github.com/jvirkki/libbloom) and expand it to all 3 langauges.\nWe chose to use the original C implementation for the Python version only, and **translated the code to pure Java and Go, without calling any C code**.\nWe chose this approach because the original C code is fairly short and straightforward, so porting it to other languages was a simple task;\nand avoiding calling C from Java and Go simplifies and shortens the build process, and reduces executable size - in both cases.\n\n## Filter headers\n\nInBloom provides utilities for serializing / deserializing Bloom filters so they can be sent over the network.\nSince when you create a Bloom filter, you need to initialize it with parameters of expected cardinality and false positive rates,\nthey are also needed to read a filter written by another party. Instead of choosing fixed parameters in our configurations, we opted for encoding\nthose parameters as a header when serizlizing the filter. We've added a 16 bit checksum for good measure as part of the header.\n\n### Serialized filter structure:\n\n| Field        | Type            | bits |\n| ------------- |:-------------:| -----:|\n| checksum      | ushort | 16 |\n| errorRate (1/N)| ushort | 16 |\n| cardinality   | int     |   32 |\n| data          | byte[]  | ? |\n\n\n## Installation\n\n#### Python\n```bash\npip install inbloom\n```\n\n#### Go\n```bash\ngo get github.com/EverythingMe/inbloom/go/inbloom\n```\n\n#### Java\n\nAdd the following lines to your build.gradle script.\n\n```groovy\nrepositories {\n    jcenter {\n        url 'http://dl.bintray.com/everythingme/generic'\n    }\n}\n\ndependencies {\n    compile 'me.everything:inbloom:0.1'\n}\n```\n\n### Example Usage\n\n#### Python\n```python\nimport inbloom\nimport base64\nimport requests\n\n# Basic usage\nbf = inbloom.Filter(entries=100, error=0.01)\nbf.add(\"abc\")\nbf.add(\"def\")\n\nassert bf.contains(\"abc\")\nassert bf.contains(\"def\")\nassert not bf.contains(\"ghi\")\n\nbf2 = inbloom.Filter(entries=100, error=0.01, data=bf.buffer())\nassert bf2.contains(\"abc\")\nassert bf2.contains(\"def\")\nassert not bf2.contains(\"ghi\")\n\n\n# Serialization\npayload = 'Yg0AZAAAABQAAAAAACAAEAAIAAAAAAAAIAAQAAgABAA='\nassert base64.b64encode(inbloom.dump(inbloom.load(base64.b64decode(payload)))) == payload\n\n# Sending it over HTTP\nserialized = base64.b64encode(inbloom.dump(bf))\nrequests.get('http://api.endpoint.me', params={'filter': serialized})\n```\n\n#### Go\n```go\n// create a blank filter - expecting 20 members and an error rate of 1/100\nf, err := NewFilter(20, 0.01)\nif err != nil {\n    panic(err)\n}\n\n// the size of the filter\nfmt.Println(f.Len())\n\n// insert some values\nf.Add(\"foo\")\nf.Add(\"bar\")\n\n// test for existence of keys\nfmt.Println(f.Contains(\"foo\"))\nfmt.Println(f.Contains(\"wat\"))\n\nfmt.Println(\"marshaled data:\", f.MarshalBase64())\n\n// Output:\n// 24\n// true\n// false\n// marshaled data: oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA=\n```\n\n```go\n// a 20 cardinality 0.01 precision filter with \"foo\" and \"bar\" in it\ndata := \"oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA=\"\n\n// load it from base64\nf, err := UnmarshalBase64(data)\nif err != nil {\n    panic(err)\n}\n\n// test it...\nfmt.Println(f.Contains(\"foo\"))\nfmt.Println(f.Contains(\"wat\"))\nfmt.Println(f.Len())\n\n// dump to pure binary\nfmt.Printf(\"%x\\n\", f.Marshal())\n// Output:\n// true\n// false\n// 24\n// a14e006400000014000000000042000011001804000200200000301000090000\n```\n\n#### Java\n```java\nimport me.everything.inbloom.BloomFilter;\nimport me.everything.inbloom.BinAscii;  // Optional - for hex representation\n\n// The basics\nBloomFilter bf = new BloomFilter(20, 0.01);\nbf.add(\"foo\");\nbf.add(\"bar\");\n\nassertTrue(bf.contains(\"foo\"));\nassertTrue(bf.contains(\"bar\"));\nassertFalse(bf.contains(\"baz\"));\n\n\nBloomFilter bf2 = new BloomFilter(bf.bf, bf.entries, bf.error);\nassertTrue(bf2.contains(\"foo\"));\nassertTrue(bf2.contains(\"bar\"));\nassertFalse(bf2.contains(\"baz\"));\n\n// Serialization\nString serialized = BinAscii.hexlify(BloomFilter.dump(bf));\nSystem.out.printf(\"Serialized: %s\\n\", serialized);\n\nString hexPayload = \"620d006400000014000000000020001000080000000000002000100008000400\";\nBloomFilter deserialized = BloomFilter.load(BinAscii.unhexlify(hexPayload));\nString dump = BinAscii.hexlify(BloomFilter.dump(deserialized));\nSystem.out.printf(\"Re-Serialized: %s\\n\", dump);\nassertEquals(dump.toLowerCase(), hexPayload);\n\nassertEquals(deserialized.entries, 20);\nassertEquals(deserialized.error, 0.01);\nassertTrue(deserialized.contains(\"abc\"));\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feverythingme%2Finbloom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feverythingme%2Finbloom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feverythingme%2Finbloom/lists"}