{"id":23226779,"url":"https://github.com/botsquad/bubble-match","last_synced_at":"2025-05-07T17:49:18.537Z","repository":{"id":54443458,"uuid":"255955062","full_name":"botsquad/bubble-match","owner":"botsquad","description":"NLU match expression engine","archived":false,"fork":false,"pushed_at":"2025-02-27T12:02:41.000Z","size":1656,"stargazers_count":22,"open_issues_count":3,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-19T05:56:49.574Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/botsquad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-15T15:13:35.000Z","updated_at":"2025-02-28T15:02:46.000Z","dependencies_parsed_at":"2024-03-07T16:39:48.063Z","dependency_job_id":"41e5d6ee-2203-4dfe-acd3-84428e00bcbe","html_url":"https://github.com/botsquad/bubble-match","commit_stats":{"total_commits":177,"total_committers":2,"mean_commits":88.5,"dds":0.02259887005649719,"last_synced_commit":"b58e83d97200b37768d98a43d62b102a6466072b"},"previous_names":[],"tags_count":44,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botsquad%2Fbubble-match","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botsquad%2Fbubble-match/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botsquad%2Fbubble-match/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botsquad%2Fbubble-match/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/botsquad","download_url":"https://codeload.github.com/botsquad/bubble-match/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252931262,"owners_count":21827102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-19T00:19:26.043Z","updated_at":"2025-05-07T17:49:18.515Z","avatar_url":"https://github.com/botsquad.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Bubblescript Matching Language (BML)\n\n[![Build Status](https://github.com/botsquad/bubble-match/workflows/test/badge.svg)](https://github.com/botsquad/bubble-match)\n[![Hex pm](http://img.shields.io/hexpm/v/bubble_match.svg?style=flat)](https://hex.pm/packages/bubble_match)\n\nBML is a rule language for matching natural language against a rule\nbase. Think of it as regular expressions for _sentences_. Whereas\nregular expressions work on individual characters, BML rules primarily\nwork on a tokenized representation of the string.\n\nBML ships with a builtin string tokenizer, but for production usage\nyou should look into using a language-specific tokenizer, e.g. to use\nthe output of [Spacy's Doc.to_json][spacy] function.\n\n[spacy]: https://spacy.io/api/doc#to_json\n\nThe full documentation on the BML syntax and the API reference is\navailable [on hexdocs.pm](https://hexdocs.pm/bubble_match/). To try\nout BML, [check out the demo\nenvironment](https://bml.botsquad.com/), powered by Phoenix\nLiveview.\n\n## Examples\n\nMatching basic sequences of words:\n\n| Match string  | Example           | Matches? |\n| ------------- | ----------------- | -------- |\n| `hello world` | Hello, world!     | **yes**  |\n| `hello world` | Well hello world  | **yes**  |\n| `hello world` | hello there world | no       |\n| `hello world` | world hello       | no       |\n\nMatching regular expressions:\n\n| Match string | Example | Matches? |\n| ------------ | ------- | -------- |\n| `/[a-z]+/`   | abcd    | **yes**  |\n\nMatch entities, with the help of Spacy and Duckling preprocessing and\ntokenizing the input:\n\n| Match string | Matches                         | Does not match  |\n| ------------ | ------------------------------- | --------------- |\n| `[person]`   | George Baker                    | Hello world     |\n| `[time]`     | I walked to the store yesterday | My name is John |\n\n## Rules overview\n\nThe match syntax is composed of adjacent and optionally nested,\nrules. Each individual has the following syntax:\n\n### Basic words\n\n`hello world`\n\nBasic words; rules consisting of only alphanumeric characters.\n\nMatching is done on both the lowercased, normalized, accents-removed\nversion of the word, and on the lemmatization of the word. The _lemma_\nof a word is its base version; e.g. for verbs it is the root form (are\n→ be, went → go); for nouns it is the singular form of the word.\n\nSome languages (german, dutch, …) have _compound nouns_, that are often\nwritten both with and without spaces or dashes. Use a dash (`-`) to\nmatch on such compound nouns: the rule `was-machine` matches all of\n`wasmachine`, `was-machine` and `was machine`.\n\nThe apostrophe sign is also supported as part of a word, for instance\nwhen matching something like `Martha's cookies`. In this case, the\napostrophe `'s` part is called the _particle_. For places where the\napostrophe is a verb, e.g. in `he'll do that`, you can write the verb\n(\"will\") in full in the BML, as Spacy will determine the proper\nverb. In that case, the BML query would be `he will do that`, which\nwould also match the version with the apostrophe. Same goes for\n`don't`, `he's`, etc.\n\n### Literals\n\n`\"Literal word sequence\"`\n\nMatches a literal piece of text, which can span multiple\ntokens. Matching is **case insensitive**, and also insensitive to\nthe presence of accented characters.\n\n### Ignoring tokens: \\_\n\n`hello _ world`\n\nThe standalone occurrence of `_` matches 0-5 of any available token,\nnon-greedy. This can be used in places where you expect a few tokens\nto occur but you don't care about the tokens.\n\n### Matching a range of tokens\n\n- `[1]` match exactly one token; any token\n- `[2+]` match 2 or more tokens (greedy)\n- `[1-3]` match 1 to 3 tokens (greedy)\n- `[2+?]` match 2 or more tokens (non-greedy)\n- `[1-3?]` match 1 to 3 tokens (non-greedy)\n\nUse this when you know how many tokens you need to match, but it does\nnot matter what the contents of the tokens is.\n\n### Entities\n\nEntity tokens: `[email]` matches a token of type `:entity` with\nvalue.kind == `email`. Entities are extracted by external means,\ne.g. by an NLP NER engine like Duckling.\n\nEntities are automatically captured under a variable with the same\nname as the entity's kind.\n\nThe default list of supported entities is the following:\n\n- `amount_of_money` (duckling)\n- `credit_card_number` (duckling)\n- `date` (spacy)\n- `distance` (duckling)\n- `duration` (duckling)\n- `email` (duckling)\n- `event` (spacy)\n- `fac` (spacy)\n- `gpe` (spacy)\n- `language` (spacy)\n- `law` (spacy)\n- `loc` (spacy)\n- `money` (spacy)\n- `norp` (spacy)\n- `number` (duckling)\n- `ordinal` (duckling)\n- `org` (spacy)\n- `percent` (spacy)\n- `person` (spacy)\n- `phone_number` (duckling)\n- `product` (spacy)\n- `quantity` (duckling)\n- `temperature` (duckling)\n- `time` (duckling)\n- `url` (duckling)\n- `volume` (duckling)\n- `work_of_art` (spacy)\n\nFrom our experience, Duckling entities work much better than Spacy\nentities, and are preferred for use. Besides being more accurate, the\nDuckling entities also provide more metadata, like valid UTC times\nwhen a date is recognized.\n\n### Regular expressions\n\n`/regex/`\n\nMatches the given regex against the sentence. Regexes can span\nmultiple tokens, thus you can match on whitespace and other token\nseparators. Regular expressions are **case insensitive**.\n\nRegular expression named capture groups are also supported, to capture\na specific part of a string: `/KL(?\u003cflight_number\u003e\\d+)/` matches\nKL12345 and extracts `12345` as the `flight_number` capture.\n\n### Per-token regular expressions\n\n`/regex/T`\n\nThe special regex flag `T` is used to indicate that the regex should be run\nagainst a single token instead of against the raw text of the sentence.\n\nThis will make the regex capturing much more 'narrow'. The regex start and end\nsymbols (`^` and `$`) are automatically added to the regex, so eg the BML\n`/\\d{4}/T` will match the token \"1234\" but _not_ \"12345\".\n\n### OR / grouping construct\n\nUse parentheses combined with the pipe `|` character to specify an OR clause.\n\n- `pizza | fries | chicken` - OR-clause on the root level without\n  parens, matches either token\n\n- `a ( a | b | c )` - use parentheses to separate OR-clauses;\n  matches one token consisting of first `a`, and then `a`, `b` or\n  `c`.\n\n- `( hi | hello )[=greeting]` matches 1 token and stores it in `greeting`\n\nParenthesis with | can also be used to capture a sequence of tokens together as one group:\n\n- `( a )[3+]` matches 3 or more token consisting of `a`\n\n### Permutation construct\n\nThe permutation construct using pointy brackets, `\u003c`, `\u003e` matches the\ngiven rules in no particular order.\n\n`\u003c a b c \u003e` matches any permutation of the sequence `a b c`; `a c b`, or `b a c`, or `c a b`, etc\n\nAn implicit `_` is inserted between all rules. So the rule `\u003ca b\u003e` can\nalso be written as `(a _ b | b _ a)`.\n\n### Start / End sentence markers\n\nTo match the beginning of end of sentences, the following constructs can be used:\n\n- `[Start]` Matches the start of a sentence\n- `[End]` Matches the end of a sentence\n\n\u003e The `[Start]` and `[End]` symbols are not always the same as the\n\u003e start and end of the input string, as sometimes the user input is\n\u003e split into multiple sentences, based on the Spacy sentence\n\u003e tokenizer.\n\n### Part-of-speech tags (word kinds)\n\nTo be able to disambiguate between word kinds, the `%` construct\nmatches on the POS-tag of a token:\n\n- `%VERB` matches any verb\n- `%NOUN` matches any noun\n\nAny other [POS Spacy tags](https://spacy.io/api/annotation#pos-en) are\nvalid as well.\n\n### Modifiers\n\n#### Capture modifier\n\n`(my name is _)[=x]` stores the entire token sequence \"My name is john\"\n\n#### Optionality modifier\n\nAn appended `?` makes the given rule optional (it needs to occur 0 or 1 times).\n\n### Repetition modifier\n\nAny rule can have a `[]` block which contains a repetition modifier\nand/or a capture expression.\n\n- `a[1]` match exactly one `a` word\n- `a[2+]` match 2 or more `a`'s (greedy)\n- `a[1-3]` match 1 to 3 `a`'s (greedy)\n- `a[2+?]` match 2 or more `a`'s (non-greedy)\n- `a[1-3?]` match 1 to 3 `a`'s (non-greedy)\n\n\n### Punctuation\n\nPunctuation is optional, and can be ignored while creating match\nrules. However, punctuation tokens _are_ stored in the tokenized\nversion of the input; in fact, multiple _tokenizations_ of the input\nare stored for each sentence, one without and one with with the\npunctuation.\n\nThe sentence `Hello, world.` is stored both as:\n\n- `Hello` `world`\n- `Hello` `,` `world` `.`\n\nMatching punctuation can be done by including the punctuation into `'`\nliteral quotes.\n\n## Sentence tokenization\n\nThe expression matching works on a per-sentence basis; the idea is\nthat it does not make sense to create expressions that span over\nsentences.\n\nThe builtin sentence tokenizer (`BubbleMatch.Sentence.Tokenizer`) does\n**not** have the concept of sentences, and thus treats each input as a\nsingle sentence, even in the existence of periods in the input.\n\nHowever, the preferred way of using this library is by running the\ninput through an NLP preprocessor like Spacy, which does tokenize an\ninput into individual sentences.\n\n## Sigil\n\nFor use within Elixir, it is possible to use a `~m` sigil which parses\nthe given BML query on compile-time:\n\n```elixir\ndefmodule MyModule do\n  use BubbleMatch.Sigil\n\n  def greeting?(input) do\n    BubbleMatch.match(~m\"hello | hi | howdy\", input) != :nomatch\n  end\nend\n```\n\n## Installation\n\nIf [available in Hex](https://hex.pm/docs/publish), the package can be installed\nby adding `bubble_match` to your list of dependencies in `mix.exs`:\n\n```elixir\ndef deps do\n  [\n    {:bubble_match, \"~\u003e 0.1.0\"}\n  ]\nend\n```\n\nDocumentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)\nand published on [HexDocs](https://hexdocs.pm). Once published, the docs can\nbe found at [https://hexdocs.pm/bubble_match](https://hexdocs.pm/bubble_match).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbotsquad%2Fbubble-match","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbotsquad%2Fbubble-match","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbotsquad%2Fbubble-match/lists"}