{"id":26233982,"url":"https://github.com/teragrep/blf_02","last_synced_at":"2025-04-22T12:11:38.458Z","repository":{"id":65923703,"uuid":"602095219","full_name":"teragrep/blf_02","owner":"teragrep","description":"Teragrep Bloom filter plugin for MariaDB","archived":false,"fork":false,"pushed_at":"2025-02-12T09:24:42.000Z","size":31,"stargazers_count":1,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T15:02:02.672Z","etag":null,"topics":["bloom-filter","bloomfilter","mariadb","mariadb-plugin","search-optimization","teragrep"],"latest_commit_sha":null,"homepage":"https://teragrep.com","language":"M4","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/teragrep.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-15T13:42:25.000Z","updated_at":"2025-02-12T09:24:46.000Z","dependencies_parsed_at":"2024-01-08T15:13:48.461Z","dependency_job_id":"b66b9685-9580-4e4d-92dd-0fb87245d1a5","html_url":"https://github.com/teragrep/blf_02","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fblf_02","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fblf_02/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fblf_02/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fblf_02/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/teragrep","download_url":"https://codeload.github.com/teragrep/blf_02/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250237834,"owners_count":21397401,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","bloomfilter","mariadb","mariadb-plugin","search-optimization","teragrep"],"created_at":"2025-03-13T01:18:22.138Z","updated_at":"2025-04-22T12:11:38.439Z","avatar_url":"https://github.com/teragrep.png","language":"M4","funding_links":[],"categories":[],"sub_categories":[],"readme":"= BLF_02: Teragrep Bloom Filter Plugin for MariaDB\n\nThis package provides two user-defined functions (UDFs) for MySQL to efficiently work with Bloom filters:\n\n- `bloommatch` function to compare two bloom filters if one is contained in the other.\n- `bloomupdate` function to combine two bloom filters.\n\nThese UDFs enable efficient querying and manipulation of Bloom filters stored in MySQL.\nBloom filters are represented as arrays of bytes in little-endian order.\n\nLicense: Apache\n\n== Installation\nInstall the blf_02 package.\n\n[source,sh]\n----\nyum install blf_02.rpm\n----\n\n=== Enabling\n\nlink:https://mariadb.com/kb/en/user-defined-functions-security/[Read more about required permissions]\n\n==== Option 1 — Execute the pre-made query\n\n[source,shell]\n----\nmariadb \u003c /opt/teragrep/blf_02/share/installdb.sql\n----\n\n==== Option 2 — Execute the queries manually\n\n[source,sql]\n----\nUSE mysql;\n\nDROP FUNCTION IF EXISTS bloommatch;\nDROP FUNCTION IF EXISTS bloomupdate;\nCREATE FUNCTION bloommatch RETURNS integer SONAME 'lib_mysqludf_bloom.so';\nCREATE FUNCTION bloomupdate RETURNS STRING SONAME 'lib_mysqludf_bloom.so';\n----\n\n=== Disabling\n\nlink:https://mariadb.com/kb/en/user-defined-functions-security/[Read more about required permissions]\n\n==== Option 1 — Execute the pre-made query\n\n[source,shell]\n----\nmariadb \u003c /opt/teragrep/blf_02/share/uninstalldb.sql\n----\n\n==== Option 2 — Execute the queries manually\n\n[source,sql]\n----\nUSE mysql;\n\nDROP FUNCTION IF EXISTS bloommatch;\nDROP FUNCTION IF EXISTS bloomupdate;\n----\n\n== Functions\n=== Match Function\nThis function performs a byte-by-bytes check of `(a \u0026 b == a)`.\nIf true, then `a` may be found in `b`.\nIf false then `a` is not in `b`.\n\nFunction in SQL:\n[source,sql]\n----\nbloommatch(blob a, blob b)\n----\n\nA Java example of how the function is used:\n[source,java]\n----\nConnection con = ... // Get the db connection\nInputStream is = ... // Input stream containing the bloom filter to locate in the table\nPreparedStatement stmt = con.prepareStatement( \"SELECT * FROM bloomTable WHERE bloommatch( ?, bloomTable.filter );\" );\nstmt.setBlob( 1, is );\nResultSet rs = stmt.executeQuery();\n// Result set now contains all the matching bloom filters from the table.\n----\n=== Update Function\nThis function performs a byte-by-byte construct of a new filter where `a | b`.\n\nFunction in SQL:\n[source, SQL]\n----\nbloomupdate( blob a, blob b )\n----\nA Java example of how the function is used:\n[source, java]\n----\nConnection con = ... // Get the db connection\nInputStream is = ... // Input stream containing the bloom filter to locate in the table\nPreparedStatement stmt = con.prepareStatement( \"UPDATE bloomTable SET filter=bloomupdate( ?, bloomTable.filter ) WHERE id=?;\" );\nstmt.setBlob( 1, is );\nstmt.setInt( 2, 5 );\nstmt.executeUpdate();\n// Bloom filters on rows with id of 5 have been updated to include values from the blob.\n----\n\n== Development\n\nMySQL client and server headers are required to compile this code.\n\nPlease do the following in the root directory of the source tree:\n\n[source,shell]\n----\naclocal\nautoconf\nautoheader\nautomake --add-missing\n\n./configure\nmake\nsudo make install\nsudo make installdb\n----\n\nTo remove the library from your system:\n\n[source]\n----\nmake uninstalldb\nmake uninstall\n----\n\n== Spark Example\n\nA short demo of how to use blf_02 in practice by using Apache Spark and Scala.\n\n=== Creating and Storing Bloom Filter to a Database\n\nIn the following example, we generate a Bloom Filter from a Spark DataFrame\nand store its serialized form in a database for later use.\n\nThe filter is stored in a table alongside a string value.\nWhen searching for a token,\nwe can first check the filter before checking the value.\n\n[source,scala]\n----\n// Generate and upload a spark bloomfilter to a database\n\nimport spark.implicits._\nimport org.apache.spark.sql._\nimport org.apache.spark.sql.types._\nimport java.sql.DriverManager\nimport org.apache.spark.util.sketch.BloomFilter\nimport java.io.{ByteArrayOutputStream,ByteArrayInputStream, ObjectOutputStream, InputStream}\n\n// Filter parameters\nval expected: Long = 500\nval fpp: Double = 0.3\n\nval dburl = \"DATABASE_URL\"\nval updatesql = \"INSERT INTO `example_strings` (`value`, `filter`) VALUES (?,?)\"\nval conn = DriverManager.getConnection(dburl,\"DB_USERNAME\",\"DB_PASSWORD\")\nval value = \"one two three\"\n\n// Create a Spark Dataframe with values 'one', 'two' and 'three'\n// This emulates a tokenized form of the value field\nval in1 = spark.sparkContext.parallelize(List(\"one\",\"two\",\"three\"))\nval df = in1.toDF(\"tokens\")\n\nval ps = conn.prepareStatement(updatesql)\n\n// Create a bloomfilter from the Dataframe\nval filter = df.stat.bloomFilter($\"tokens\", expected, fpp)\nprintln(filter.mightContain(\"one\"))\n\n// Write a filter bit array to the output stream\nval baos = new ByteArrayOutputStream\nfilter.writeTo(baos)\nval is: InputStream = new ByteArrayInputStream(baos.toByteArray())\nps.setString(1, value)\nps.setBlob(2,is)\nval update = ps.executeUpdate\nprintln(\"Updated rows: \"+ update)\ndf.show()\nconn.close()\n----\n\n=== Finding Matching Filters\nA Bloom Filter is created from a Spark DataFrame\nand compared with stored filters in the database to retrieve matching string values.\nNote that each comparison generates a new Bloom Filter for the SQL function.\n\nImagine we want to search if a value\ncontains tokens `one` and `two` from the previous example.\n[source,scala]\n----\n// Create a bloomfilter and find matches\nimport spark.implicits._\nimport org.apache.spark.sql._\nimport org.apache.spark.sql.types._\nimport java.sql.DriverManager\nimport org.apache.spark.util.sketch.BloomFilter\nimport java.io.{ByteArrayOutputStream,ByteArrayInputStream, ObjectOutputStream, InputStream}\n\n// Generated filter array must have the same length as the one it is compared to\nval expected: Long = 500\nval fpp: Double = 0.3\n\nval dburl = \"DATABASE_URL\"\nval conn = DriverManager.getConnection(dburl,\"DB_USERNAME\",\"DB_PASSWORD\")\n\nval updatesql = \"SELECT `value` FROM `example_strings` WHERE bloommatch(?, `example_strings`.`filter`);\"\nval ps = conn.prepareStatement(updatesql)\n\n// Creating a filter with values 'one' and 'two'\nval in2 = spark.sparkContext.parallelize(List(\"one\",\"two\"))\nval df2 = in2.toDF(\"tokens\")\nval filter = df2.stat.bloomFilter($\"tokens\", expected, fpp)\n\nval baos = new ByteArrayOutputStream\n            filter.writeTo(baos)\n            baos.flush()\n            val is :InputStream = new ByteArrayInputStream(baos.toByteArray())\n            ps.setBlob(1, is)\n            val rs = ps.executeQuery\n\n// Will find a match since tokens searched are both in the filter\nval resultList = Iterator.from(0).takeWhile(_ =\u003e rs.next()).map(_ =\u003e rs.getString(1)).toList\nprintln(\"Found matches: \" + resultList.size)\nconn.close()\n----\n== Contributing\n\n// Change the repository name in the issues link to match with your project's name\n\nYou can involve yourself with our project by https://github.com/teragrep/blf_02/issues/new/choose[opening an issue] or submitting a pull request.\n\nContribution requirements:\n\n. *All changes must be accompanied by a new or changed test.* If you think testing is not required in your pull request, include a sufficient explanation as why you think so.\n. Security checks must pass\n. Pull requests must align with the principles and http://www.extremeprogramming.org/values.html[values] of extreme programming.\n. Pull requests must follow the principles of Object Thinking and Elegant Objects (EO).\n\nRead more in our https://github.com/teragrep/teragrep/blob/main/contributing.adoc[Contributing Guideline].\n\n=== Contributor License Agreement\n\nContributors must sign https://github.com/teragrep/teragrep/blob/main/cla.adoc[Teragrep Contributor License Agreement] before a pull request is accepted to organization's repositories. \n\nYou need to submit the CLA only once. After submitting the CLA you can contribute to all Teragrep's repositories. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteragrep%2Fblf_02","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteragrep%2Fblf_02","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteragrep%2Fblf_02/lists"}