{"id":17912705,"url":"https://github.com/stypox/dicio-sentences-compiler","last_synced_at":"2026-03-09T19:02:33.310Z","repository":{"id":44908964,"uuid":"195646511","full_name":"Stypox/dicio-sentences-compiler","owner":"Stypox","description":"Sentences-compiler for Dicio assistant","archived":false,"fork":false,"pushed_at":"2024-07-23T16:55:11.000Z","size":378,"stargazers_count":10,"open_issues_count":2,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-02T04:03:38.052Z","etag":null,"topics":["assistant","assistive-technology","compiler","dicio","dicio-assistant","dicio-sentences-language","personal-assistant","personal-assistant-framework","voice-assistant"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Stypox.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-07T12:07:57.000Z","updated_at":"2025-02-21T15:51:33.000Z","dependencies_parsed_at":"2024-02-28T12:28:25.457Z","dependency_job_id":"02d7f17e-2977-449a-8841-eab599b1dac5","html_url":"https://github.com/Stypox/dicio-sentences-compiler","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stypox%2Fdicio-sentences-compiler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stypox%2Fdicio-sentences-compiler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stypox%2Fdicio-sentences-compiler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stypox%2Fdicio-sentences-compiler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Stypox","download_url":"https://codeload.github.com/Stypox/dicio-sentences-compiler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244326202,"owners_count":20435122,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assistant","assistive-technology","compiler","dicio","dicio-assistant","dicio-sentences-language","personal-assistant","personal-assistant-framework","voice-assistant"],"created_at":"2024-10-28T19:46:38.497Z","updated_at":"2026-03-09T19:02:33.210Z","avatar_url":"https://github.com/Stypox.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sentences compiler for Dicio assistant\nThis tool provides a simple way to generate **sentences to be matched** for the Dicio assistant. It compiles files formatted with the **Dicio-sentences-language to Java code** that can be easily imported in projects using the [interpreter](https://github.com/Stypox/dicio-skill/) of the Dicio assistant. It allows to **pack** together similar sentences while preserving **readability**.\n\nThis repository is part of the **Dicio** project. Also check out [`dicio-android`](https://github.com/Stypox/dicio-android), [`dicio-skill`](https://github.com/Stypox/dicio-skill/) and [`dicio-numbers`](https://github.com/Stypox/dicio-numbers). *Open to contributions :-D*\n\n## Dicio sentences language\nEvery file contains many sections, starting with section information and followed by a list of sentences. The section information is formatted like `SECTION_ID:SPECIFICITY`, where SPECIFICITY can be `low`, `medium` and `high`, representing **how specific the set of sentences is**. For example, a section that matches queries about phone calls is very specific, while one that matches every question about famous people has a lower specificity. The specificity is needed to **prevent conflicts** between two sections that both match with a high score: the most specific is preferred.\nThen sentences follow: every sentence is made of an optional **sentence id** (formatted like `[SENTENCE_ID]` and used for sentence **identification purposes**) and a **list of constructs** followed by a `;`. Constructs can be:\n- **diacritics-insensitive word** (e.g. `hello`). A simple word: it can contain uppercase and lowercase unicode letters. Diacritics and accents will be ignored while matching. E.g. `hello` matches `hèllo`, `héllò`, ...\n- **diacritics-sensitive word** (e.g. `\"hello\"`). Just like the diacritics-insensitive word, but in this case diacritics count. E.g. `\"hello\"` matches only the exact `hello`.\n- **word with variations** (e.g. `\u003ce|g?\u003email` (diacritics-insensitive) or `\"\u003ce|g?\u003email\"` (diacritics-sensitive)). A word with possible variations. This construct is not so useful in the English language, but comes in handy in languages where words can have multiple declensions. A variations group, i.e. the piece included in the angle brackets `\u003c\u003e`, is a set of variations separated by `|`, and optionally followed by a `?` indicating the empty variation. A word can have multiple variations groups, e.g. `\u003ca|b\u003ec\u003cd?\u003e\u003ce|f?\u003e` matches all of `adce`, `acdf`, `acd`, `ace`, `acf`, `ac`, `bdce`, `bcdf`, `bcd`, `bce`, `bcf`, `bc`. Pay attention to spaces: `\u003ca|b\u003e c\u003cd?\u003e\u003ce|f?\u003e` would have been 2 separate words! E.g. `\u003ce|g?\u003email` matches `email`, `gmail` and `mail`.\n- **or-red constructs** (e.g. `hello|hi`). Any of the or-red construct could match. E.g. `hello|hi|hey assistant` matches `hello assistant`, `hi assistant` and `hey assistant`.\n- **optional construct** (e.g. `hello?`). This construct can be skipped during parsing. E.g. `bye bye?` matches both `bye` and `bye bye`\n- **parenthesized construct** (e.g. `(hello)`). This lets you pack constructs toghether and, for example, \"or\" them all. Just as math parenthesis do. E.g. `how (are you doing?)|(is it going)` matches `how are you`, `how are you doing` and `how is it going`.\n- **capturing group** (`.NAME.`). This tells the interpreter to match a variable-length list of any word to that part of the sentence. NAME is the name of the capturing group. E.g. `how are you .person.` matches `how are you Tom` with `Tom` in the \"person\" capturing group.\n\n### Punctuation marks and special characters\n\nNote that punctation marks should **not** be inserted. Words are only made of letters, and other special characters are part of the the language's grammar, so characters `[]\"|?().\u003c\u003e` will be interpreted with their special meaning explained above and other punctation marks will generate errors. *But this does not mean that Dicio is not able to handle sentences with punctation marks!* Before being processed, the **input from the user is split into lowercase letter-only words**, so \"It's\" becomes \"it\" and \"s\" (the relevant code is at [`dicio-skill`](https://github.com/Stypox/dicio-skill/)). Therefore, when writing a dicio-sentences-language sentence which could contain **e.g. apostrophes, just replace them with a space** to obtain the same result. The case of letters (and their diacritics, for diacritics-insensitive words) is ignored, too.\n\n### Sentence example and explanation\n\n```\nweather: high\n(what s|is)|whats \"the\" weather like? (\u003ci|o\u003en .where.)?;\n```\nThe example above declares a section named \"weather\" with a high specificity (high, since... *what else could weather mean, if not atmospheric conditions?*). Then a sentence follows:\n- `(what s|is)|whats` matches `what + s`, `what + is` and `whats` (and thus also the raw `what's`, since the apostrophe would be considered a word separator; **you cannot** insert `'` or `-` or any other punctuation in `.dslf` files though, see the note [above](#punctuation-marks-and-special-characters)!)\n- `\"the\"` is a diacritics-sensitive word, so only the exact `the` would match: `thè` wouldn't. *Note that this word was made diacritics-sensitive just for **demonstration purposes**, since usually in the English language there are no issues with diacritics.*\n- `weather`, just like all of the other words in the sentence expect for `\"the\"`, is diacritics-insensitive, so it matches: `weather`, `wèathér`, `weàthèr`, ...\n- `like?` is an optional word that matches both `like` and nothing.\n- `(\u003ci|o\u003en .where.)?` is optional, and contains a word with variations and a capturing group. `\u003ci|o\u003en` matches both `in` and `on`. Therefore, overall, it matches `in milan` and `on the moon` with respectively `milan` and `the moon` in the \"where\" capturing group, or it matches nothing with no word in the capturing group.\n\nSo all of the following inputs from the user would match the above sentence perfectly:\n- `What's the wéather in London?` (`London` is captured as the \"place\")\n- `wHat is THE weaTher lIke?` (no \"place\" specified)\n- `whÀts the weather` (also no \"place\" specified)\n\n## Compilation process\nWhen issuing a compilation, `dicio-sentences-compiler` will first parse the provided file and build a syntax tree. Then every sentence is analyzed and converted into a format which allows running a `O(number of words)` **depth-first search** on it with as little runtime overhead as possible. Every word in the sentence is assigned a **unique index**, a list of indices of **all words that could come next**, and the minimum number of **words to skip** to get to the end of the sentence. The index is used *(you guessed it!)* just for indexing. The list of next word indices is needed to instantly determine the possible next words during a depth-first search. The number of words to get to the end allows lowering the score accordingly while doing the search, without having to recalculate it at runtime. When a **section is put together**, besides the list of compiled and analyzed sentences, it has the [**specificity value**](https://github.com/Stypox/dicio-skill/#input-recognizer) and (if applicable) the **list of all capturing group names**, to allow compiling them to language variables, for convenience's sake and to prevent typos, much like with Android's `R` class.\n\n### Java\nThe compilation to Java relies on the [`dicio-skill`](https://github.com/Stypox/dicio-skill) library, so sections will be compiled in this format:\n```java\nStandardRecognizerData SECTION_NAME = new StandardRecognizerData(\n        InputRecognizer.Specificity.SPECIFICITY,\n        new Sentence(SENTENCE_ID, LIST_OF_STARTING_WORD_INDICES,\n                new DiacriticsSensitiveWord(VALUE, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),\n                new DiacriticsInsensitiveWord(NORMALIZED_VALUE, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),\n                new DiacriticsSensitiveRegexWord(REGEX, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),\n                new DiacriticsInsensitiveRegexWord(REGEX, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),\n                new CapturingGroup(NAME, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),\n                new ...(...), ...),\n        new Sentence(...), ...);\n```\nIf a section collected the capturing group names, they will be compiled to variables accessible as a field of the section, by `extend`ing `StandardRecognizerData`, that is:\n```java\nclass SectionClass_SECTION_NAME extends StandardRecognizerData {\n        SectionClass_SECTION_NAME() { super(... INITIALIZED AS ABOVE ...); }\n        public String CAPTURING_GROUP_1 = \"CAPTURING_GROUP_1\", CAPTURING_GROUP_2 = \"CAPTURING_GROUP_2\", ...;\n}\nSectionClass_SECTION_NAME SECTION_NAME = new SectionClass_SECTION_NAME();\n```\nIf a section map name is provided via the `--create-section-map` parameter, a `Map\u003cString, StandardRecognizerData\u003e` will be created containing a mapping between section ids and their corresponding `StandardRecognizerData` instance. This can be useful for autogeneration code (like that found in [`dicio-android`'s `build.gradle`](https://github.com/Stypox/dicio-android/blob/master/app/build.gradle)) in combination with the `--sections-file` parameter.\n\n## Build and run\nTo build the project open it in Android Studio (IntelliJ Idea probably works, too) and create an Application configuration in the \"Run/Debug Configurations\" menu, set \"Main class\" to `org.dicio.sentences_compiler.main.SentencesCompiler`, \"Use classpath of module\" to `sentences_compiler` and \"Program arguments\" to the arguments for the compiler. Then run the newly created configuration with the \"Run\" button. Set `--help` as \"Program arguments\" to get an help screen explaining the options.\n\nThis project can be also used as a library. In that case, add `'com.github.Stypox:dicio-sentences-compiler:VERSION'` to your Gradle dependencies, replacing `VERSION` with the latest release or commit. Then use the `org.dicio.sentences_compiler.main.SentencesCompiler#compile()` function to compile using input files and output streams (take a look at the `javadoc` documentation provided there).\n\n## Example\nThe file below is [`example.dslf`](example.dslf). \"dslf\" means \"Dicio-Sentences-Language File\".\n```\nmood: high       # comments are supported :-D\nhow (are you doing?)|(is it go\u003cing|ne\u003e);\n[has_place] how is it going over \u003ct?\u003ehere;\n[french] comment \"êtes\" voùs;  # quotes make sure êtes is matched diacritics-sensitively,\n                               # while voùs will be matched the same way as vous\n\nGPS_navigation: medium\n[question]  take|bring me to .place. (by .vehicle.)? please?;\n[question]  give me directions to .place. please?;\n[question]  how do|can i get to .place.;\n[statement] i want to go to .place. (by .vehicle.)?;\n[statement] .place. is the place i want to go to;\n```\nThe above Dicio-sentences-language file is compiled to Java code by running the sentences-compiler as explained [above](#build-and-run), and setting the line below as \"Program arguments\".\n```sh\n--input \"example.dslf\" --output \"ClassName.java\" --sections-file \"stdout\" java --variable-prefix \"section_\" --package \"com.pkg.name\" --class \"ClassName\" --create-section-map \"sections\"\n```\nAfter clicking on the \"Run\" button, `mood GPS_navigation` should be outputted and the Java code shown below should be inside a file called ClassName.java in the root directory of the repository. Indentation and spacing were added manually in order to improve readability.\n```java\n/*\n * FILE AUTO-GENERATED BY dicio-sentences-compiler. DO NOT MODIFY.\n */\n\npackage com.pkg.name;\n\nimport java.util.Map;\nimport java.util.HashMap;\nimport org.dicio.skill.chain.InputRecognizer.Specificity;\nimport org.dicio.skill.standard.Sentence;\nimport org.dicio.skill.standard.StandardRecognizerData;\nimport org.dicio.skill.standard.word.DiacriticsInsensitiveWord;\nimport org.dicio.skill.standard.word.DiacriticsSensitiveWord;\nimport org.dicio.skill.standard.word.CapturingGroup;\n\npublic class ClassName {\n\tpublic static final StandardRecognizerData section_mood = new StandardRecognizerData(Specificity.high,\n\t\tnew Sentence(\"\", new int[]{0},\n\t\t\tnew DiacriticsInsensitiveWord(\"how\", 4, 1, 4),\n\t\t\tnew DiacriticsInsensitiveWord(\"are\", 3, 2),\n\t\t\tnew DiacriticsInsensitiveWord(\"you\", 2, 3, 7),\n\t\t\tnew DiacriticsInsensitiveWord(\"doing\", 1, 7),\n\t\t\tnew DiacriticsInsensitiveWord(\"is\", 3, 5),\n\t\t\tnew DiacriticsInsensitiveWord(\"it\", 2, 6),\n\t\t\tnew DiacriticsInsensitiveRegexWord(\"go(?:ing|ne)\", 1, 7)),\n\t\tnew Sentence(\"has_place\", new int[]{0},\n\t\t\tnew DiacriticsInsensitiveWord(\"how\", 6, 1),\n\t\t\tnew DiacriticsInsensitiveWord(\"is\", 5, 2),\n\t\t\tnew DiacriticsInsensitiveWord(\"it\", 4, 3),\n\t\t\tnew DiacriticsInsensitiveWord(\"going\", 3, 4),\n\t\t\tnew DiacriticsInsensitiveWord(\"over\", 2, 5),\n\t\t\tnew DiacriticsInsensitiveRegexWord(\"(?:t|)here\", 1, 6)),\n\t\tnew Sentence(\"french\", new int[]{0},\n\t\t\tnew DiacriticsInsensitiveWord(\"comment\", 3, 1),\n\t\t\tnew DiacriticsSensitiveWord(\"êtes\", 2, 2),\n\t\t\tnew DiacriticsInsensitiveWord(\"vous\", 1, 3)));\n\n\tpublic static final class SectionClass_section_GPS_navigation extends StandardRecognizerData{\n\t\tSectionClass_section_GPS_navigation(){\n\t\t\tsuper(Specificity.medium,\n\t\t\t\tnew Sentence(\"question\", new int[]{0, 1},\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"take\", 9, 2),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"bring\", 11, 2),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"me\", 10, 3),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"to\", 9, 4),\n\t\t\t\t\tnew CapturingGroup(\"place\", 8, 5, 7, 8),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"by\", 6, 6),\n\t\t\t\t\tnew CapturingGroup(\"vehicle\", 5, 7, 8),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"please\", 4, 8)),\n\t\t\t\tnew Sentence(\"question\", new int[]{0},\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"give\", 7, 1),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"me\", 6, 2),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"directions\", 5, 3),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"to\", 4, 4),\n\t\t\t\t\tnew CapturingGroup(\"place\", 3, 5, 6),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"please\", 1, 6)),\n\t\t\t\tnew Sentence(\"question\", new int[]{0},\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"how\", 9, 1, 2),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"do\", 6, 3),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"can\", 8, 3),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"i\", 7, 4),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"get\", 6, 5),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"to\", 5, 6),\n\t\t\t\t\tnew CapturingGroup(\"place\", 4, 7)),\n\t\t\t\tnew Sentence(\"statement\", new int[]{0},\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"i\", 10, 1),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"want\", 9, 2),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"to\", 8, 3),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"go\", 7, 4),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"to\", 6, 5),\n\t\t\t\t\tnew CapturingGroup(\"place\", 5, 6, 8),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"by\", 3, 7),\n\t\t\t\t\tnew CapturingGroup(\"vehicle\", 2, 8)),\n\t\t\t\tnew Sentence(\"statement\", new int[]{0},\n\t\t\t\t\tnew CapturingGroup(\"place\", 10, 1),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"is\", 8, 2),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"the\", 7, 3),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"place\", 6, 4),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"i\", 5, 5),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"want\", 4, 6),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"to\", 3, 7),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"go\", 2, 8),\n\t\t\t\t\tnew DiacriticsInsensitiveWord(\"to\", 1, 9)));\n\t\t}\n\t\tpublic final String place = \"place\", vehicle = \"vehicle\";\n\t}\n\tpublic static final SectionClass_section_GPS_navigation section_GPS_navigation = new SectionClass_section_GPS_navigation();\n\n\tpublic static final Map\u003cString, StandardRecognizerData\u003e sections = new HashMap\u003cString, StandardRecognizerData\u003e() {{\n\t\tput(\"mood\", section_mood);\n\t\tput(\"GPS_navigation\", section_GPS_navigation);\n\t}};\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstypox%2Fdicio-sentences-compiler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstypox%2Fdicio-sentences-compiler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstypox%2Fdicio-sentences-compiler/lists"}