{"id":18263843,"url":"https://github.com/atomist/microgrammar","last_synced_at":"2025-04-04T20:31:24.875Z","repository":{"id":55127394,"uuid":"94803633","full_name":"atomist/microgrammar","owner":"atomist","description":"Atomist microgrammar NPM TypeScript module","archived":false,"fork":false,"pushed_at":"2021-01-08T04:41:41.000Z","size":1070,"stargazers_count":26,"open_issues_count":0,"forks_count":4,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-20T19:07:37.476Z","etag":null,"topics":["node"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/atomist.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null}},"created_at":"2017-06-19T17:36:05.000Z","updated_at":"2023-02-26T14:12:46.000Z","dependencies_parsed_at":"2022-08-14T12:50:32.684Z","dependency_job_id":null,"html_url":"https://github.com/atomist/microgrammar","commit_stats":null,"previous_names":[],"tags_count":302,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atomist%2Fmicrogrammar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atomist%2Fmicrogrammar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atomist%2Fmicrogrammar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/atomist%2Fmicrogrammar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/atomist","download_url":"https://codeload.github.com/atomist/microgrammar/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247246402,"owners_count":20907789,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["node"],"created_at":"2024-11-05T11:12:53.806Z","updated_at":"2025-04-04T20:31:24.063Z","avatar_url":"https://github.com/atomist.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# @atomist/microgrammar\n\n[![atomist sdm goals](http://badge.atomist.com/T29E48P34/atomist/microgrammar/92d2035b-575e-41c4-9088-996dc70d69c2)](https://app.atomist.com/workspace/T29E48P34)\n[![npm version](https://img.shields.io/npm/v/@atomist/microgrammar.svg)](https://www.npmjs.com/package/@atomist/microgrammar)\n\nParsing library written in [TypeScript][ts], filling the large gap\nbetween the sweet spots of regular expressions and full-blown\n[BNF][bnf] or equivalent grammars.  It can parse and cleanly update\nstructured content.\n\n[API Doc](https://atomist.github.io/microgrammar/)\n\n[ts]: https://www.typescriptlang.org/ (TypeScript)\n\n## Concepts\n\n**Microgrammars** are a powerful way of parsing structured content\nsuch as source code, described in this [Stanford paper][mg-paper].\nMicrogrammars are designed to recognize structures in a string or\nstream and extract their content: For example, to recognize a Java\nmethod that has a particular annotation and to extract particular\nparameters. They are more powerful and [typically more\nreadable][regex-hell] than [regular expressions][regex] for complex\ncases, although they can be built using regular expressions.\n\n[mg-paper]: http://web.stanford.edu/~mlfbrown/paper.pdf (How to build static checking systems using orders of magnitude less code Brown et al., ASPLOS 2016)\n\nAtomist microgrammars go beyond the Stanford paper example in that\nthey permit _updating_ as well as matching, preserving positions. They\nalso draw inspiration from other experience and sources such as the\nold [SNOBOL programming language][snobol].\n\n[snobol]: https://en.wikipedia.org/wiki/SNOBOL (SNOBOL Programming Language)\n[regex-hell]: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454\n[regex]: https://en.wikipedia.org/wiki/Regular_expression\n\n## Examples\n\nThere are two styles of use:\n\n-   From *definitions*: Defining a grammar in JavaScript objects representing the subcomponents (lower level productions)\n-   From strings: Defining a grammar in a string that resembles input\n    that will be matched\n        \nA microgrammar has a return type defined by its definitions. Each match implements this interface and also the `PatternMatch` interface, which exposes the offset within the input and matched value, which may differ from the exposed typed value. (For example, a `Person` might have a `forename` and `surname`, but its `$matched` value might include the entire matched string with whitespace.) The fields of the `PatternMatch` interface begin with a `$` to ensure that they are out of band.\n\nWhen you've defined a microgrammar, you can use it to match input: usually a string.\n\nGenerator-style iteration is usually most efficient, and looks like this:\n\n```typescript\nconst matches = myMicrogrammar.matchIterator(inputString);\nfor (const match of matches) {\n\t// Do with match. You can jump out of the generator here.\n}\n```\nYou can also get all matches in one pass, like this:\n\n```typescript\nconst matches = myMicrogrammar.findMatches(inputString);\nfor (const match of matches) {\n\t// Do with match\n}\n```\n\nIf you are seeking only one match, you can use a method that returns a match or `undefined`, as follows:\n\n```typescript\nconst match = myMicrogrammar.firstMatch(inputString);\nif (match) {\n\t// Do with match\n}\n```\n\n### Definitions style\n\nHere's a simple example:\n\n```typescript\nconst mg = microgrammar\u003c{name: string, age: number}\u003e({\n    name: /[a-zA-Z0-9]+/,\n    _col: \":\",\n    age: Integer\n});\n\nconst results = mg.findMatches(\"-celine:61 greg*^ tom::: mandy:11\");\nassert(result.length === 2);\nconst first = results[0];\nassert(first.$matched === \"celine:61\");\n// The offset of this match was the 1st character, as the 0th was discarded\nassert(first.$offset === 1);\nassert(first.name === \"celine\");\nassert(first.age === 61);\n```\n\nSome notes:\n\n-   A microgrammar definition is typically an object literal, with its\n    properties being matched in turn. This is like **concatenation**\n    in a BNF grammar.\n-   Matcher property values can be regular expressions (like\n    `/[a-zA-Z0-9]+/` here), string literals (like `:`), or custom\n    matchers (like `Integer`). It's easy to define custom matchers for\n    use in composition.\n-   All properties need to match for the whole microgrammar to match.\n-   Properties that match are bound to the result, unless their names begin with `_`, in which\n    case the values are discarded.\n-   Certain out of band values, beginning with `$`, are added to the\n    results, showing the exact text that matched, the offset etc.\n-   When using TypeScript, microgrammar returns can be strongly typed. In this case we've\n    used an anonymous type, but we could also use an interface. We\n    could also use untyped, JavaScript style.\n-   Matching skips junk such as `greg*^ tom:::`. In this case, `greg`\n    and `tom:` will look like the start of valid matches, but the\n    first will fail when it can't match a `:` and the second when\n    there isn't a digit after the colon.\n-   We can match against a string or a stream. In this case we've used\n    a string. In stream matching, we'd be more likely to use one an\n    API offering callbacks rather than building an array, so we don't\n    need to hold all our matches in memory at once.\n\nOf course, such a simple example could easily be handled by a regular\nexpression and capture groups. But the power becomes apparent with\nnested productions and more elaborate matchers.\n\nA more complex example, showing composition:\n\n```typescript\nexport const CLASS_NAME = /[a-zA-Z_$][a-zA-Z0-9_$]+/;\n\n// Any annotation we're not interested in\nconst DiscardedAnnotation = {\n    _at: \"@\",\n    _annotationName: CLASS_NAME,\n    _content: optional(JavaParenthesizedExpression),\n};\n\nconst SpringBootApp = microgrammar\u003c{ name: string }\u003e({\n    _app: \"@SpringBootApplication\",\n    _content: optional(JavaParenthesizedExpression),\n    _otherAnnotations: zeroOrMore(DiscardedAnnotation),\n    _visibility: optional(\"public\"),\n    _class: \"class\",\n    name: CLASS_NAME,\n});\n```\n\nThis will match content like this:\n\n```java\n@SpringBootApplication\n@Foo\n@Bar(name = \"Baz\", magicParam = 31754)\npublic class MySpringBootApplication\n```\n\nNotes:\n\n-   `JavaParenthesizedExpression` is a built-in matcher constant that\n    matches any valid Java content within `(...)`. It uses a state\n    machine. It's easy to write such custom matchers.\n-   By default, microgrammars are tolerant of whitespace, treating it\n    as a token separator. This is the behavior we want when parsing\n    most languages or configuration formats.\n-   Because the other properties have names beginning with `_`, only\n    the class name (`MySpringBootApplication` in our example) is bound\n    to the result. We care about the structure of the rest of the\n    class declaration, but we don't need to extract other values in\n    this particular case.\n\n### String style\n\nThis is a higher level usage model in which a string resembling the\ndesired input but with variable placeholders is used to define the\ngrammar.\n\nThis style is ideally suited for simpler grammars. For example:\n\n```typescript\nconst ValuePredicateGrammar = microgrammar\u003cPredicate\u003e({\n    phrase: \"@${name}='${value}'\"});\n```\n\nIt can be combined with the definitional style through providing\noptional definitions for the named fields. For example, to constrain\nthe match on a name in the above example using a regular expression:\n\n```typescript\nconst ValuePredicateGrammar = microgrammar\u003cPredicate\u003e({\n    phrase: \"@${name}='${value}'\", \n    terms: {\n    \tname: /[a-z]+/\n    }\n});\n```\n\nAs with the object definitional style, whitespace is ignored by default.\n\nFurther documentation can be found in the\n[reference](docs/reference.md).  You can also take a look at the tests\nin this repository.\n\n## Alternatives and when to use microgrammars\n\nMicrogrammars have obvious similarities to [BNF grammars][bnf], but\ndiffer in some important respects:\n\n-   They are intended to match and explain _parts_ of the input, rather\n    than the whole input\n-   They excel at skipping content they are uninterested in\n-   They are not necessarily context free\n-   They do not need to construct a full AST, although they construct\n    ASTs for structures they do match. Thus they can easily cope with\n    partially structured data, happily skipping over incomprehensible content\n\n[bnf]: https://en.wikipedia.org/wiki/Backus–Naur_form (Backus–Naur Form)\n\nSimilarities are:\n\n-   The idea of **productions**\n-   Composability, including the ability to reuse productions in\n    different grammars\n-   Operations such as _alternative_, _optional_ and _rep_, that\n    enable building complex structures.\n\nCompared to regular expressions, microgrammars are:\n\n-   Capable of handing greater complexity\n-   More composable\n-   Higher level, able to use regular expressions as building blocks\n-   Capable of expressing nested structures\n-   Arbitrarily extensible through TypeScript function predicates and\n    custom **matchers**\n\nWhile it would be overkill to use a microgrammar for something that\ncan be expressed in a simple regex, microgrammars tend to be clearer\nfor complex cases.\n\n## Usage\n\nThe [`@atomist/microgrammar` package][mg-npm] contains both the\nTypeScript typings and compiled JavaScript.  You can use this project\nby adding the dependency in your `package.json`.\n\n```\n$ npm install --save @atomist/microgrammar\n```\n\n[mg-npm]: https://www.npmjs.com/package/@atomist/microgrammar (@atomist/microgrammar Node.js Package)\n\n## Troubleshooting\n\nIf you struggle to make your microgrammars match, please refer to the [troubleshooting page][trouble].\n\n[trouble]: docs/trouble.md (Troubleshooting microgrammars)\n\n## Performance considerations\n\nSee [Writing efficient microgrammars][efficiency].\n\n[efficiency]: docs/performance.md (Writing efficient microgrammars)\n\n## Support\n\nGeneral support questions should be discussed in the `#help`\nchannel in the [Atomist community Slack workspace][slack].\n\nIf you find a problem, please create an [issue][].\n\n[issue]: https://github.com/atomist/microgrammar/issues\n\n## Development\n\nYou will need to install [Node.js][node] to build and test this\nproject.\n\n[node]: https://nodejs.org/ (Node.js)\n\n### Build and test\n\nInstall dependencies.\n\n```\n$ npm install\n```\n\nUse the `build` package script to compile, test, lint, and build the\ndocumentation.\n\n```\n$ npm run build\n```\n\n### Release\n\nReleases are handled via the [Atomist SDM][atomist-sdm].  Just press\nthe 'Approve' button in the Atomist dashboard or Slack.\n\n[atomist-sdm]: https://github.com/atomist/atomist-sdm (Atomist Software Delivery Machine)\n\n---\n\nCreated by [Atomist][atomist].\nNeed Help?  [Join our Slack workspace][slack].\n\n[atomist]: https://atomist.com/ (Atomist - How Teams Deliver Software)\n[slack]: https://join.atomist.com/ (Atomist Community Slack)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatomist%2Fmicrogrammar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fatomist%2Fmicrogrammar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatomist%2Fmicrogrammar/lists"}