{"id":17248879,"url":"https://github.com/serras/t-regex","last_synced_at":"2025-04-14T04:50:45.791Z","repository":{"id":22241707,"uuid":"25575043","full_name":"serras/t-regex","owner":"serras","description":"Matchers and grammars using tree regular expressions","archived":false,"fork":false,"pushed_at":"2015-10-23T11:39:33.000Z","size":468,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-27T18:52:24.577Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/serras.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-10-22T10:12:29.000Z","updated_at":"2022-03-25T15:54:29.000Z","dependencies_parsed_at":"2022-08-20T01:00:09.934Z","dependency_job_id":null,"html_url":"https://github.com/serras/t-regex","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serras%2Ft-regex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serras%2Ft-regex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serras%2Ft-regex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serras%2Ft-regex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/serras","download_url":"https://codeload.github.com/serras/t-regex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248824695,"owners_count":21167343,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T06:42:21.814Z","updated_at":"2025-04-14T04:50:45.767Z","avatar_url":"https://github.com/serras.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"`t-regex`: matchers and grammars using tree regular expressions\n===============================================================\n\n`t-regex` defines a series of combinators to express tree regular\nexpressions over any Haskell data type. In addition, with the use\nof some combinators (and a bit of Template Haskell), it defines\nnice syntax for using this tree regular expressions for matching\nand computing attributes over a term.\n\n## Defining your data type\n\nIn order to use `t-regex`, you need to define you data type in a\nway amenable to inspection. In particular, it means that instead\nof a closed data type, you need to define a *pattern functor* in\nwhich recursion is described using the final argument, and which\nshould instantiate the `Generic1` type class (this can be done\nautomatically if you are using GHC).\n\nFor example, the following block of code defines a `Tree'` data\ntype in the required fashion:\n```haskell\n{-# LANGUAGE DeriveGeneric #-}\nimport GHC.Generics\n\ndata Tree' f = Leaf'\n             | Branch' { elt :: Int, left :: f, right :: f }\n             deriving (Generic1, Show)\n```\nNotice the `f` argument in the place where `Tree'` would usually be\nfound. In addition, we have declared the constructors using `'` at\nthe end, but we will get rid of them soon.\n\nNow, if you want to create terms, you need to *close* the type, which\nessentially makes the `f` argument refer back to `Tree'`. You do so\nby using `Fix`:\n```haskell\ntype Tree = Fix Tree'\n```\nHowever, this induces the need to add explicit `Fix` constructors at\neach level. To alleviate this problem, let's define *pattern synonyms*,\navailable from GHC 7.8 on:\n```haskell\npattern Leaf = Fix Leaf'\npattern Branch n l r = Fix (Branch' n l r)\n```\nFrom an outsider point of view, now your data type is a normal one,\nwith `Leaf` and `Branch` as its constructors:\n```haskell\naTree = Branch 2 (Branch 2 Leaf Leaf) Leaf\n```\n\n## Tree regular expressions\n\nTree regular expressions are parametrized by a pattern functor: in this\nway they are flexible enough to be used with different data types,\nwhile keeping our loved Haskell type safety.\n\nThe available combinators to build regular expressions follow the syntax\nof [Tree Automata Techniques and Applications](http://tata.gforge.inria.fr/),\nChapter 2.\n\n#### Emptiness\n\nThe expressions `empty_` and `none` do not match any value. They can be\nused to signal an error branch on an expressions.\n\n#### Whole language\n\nYou can match any possible term using `any_`. It is commonly use in\ncombination with `capture` to extract information from a term.\n\n#### Choice\n\nA regular expression of the form `r1 \u003c||\u003e r2` tries to match `r1`, and if\nit not possible, it tries to do so with `r2`. Note than when capturing,\nthe first regular expression is given priority.\n\n#### Injection\n\nOf course, at some point you would like to check whether some term has\na specific shape. In our case, this means that it has been constructed in\nsome specific way. In order to do so, you need to *inject* the\nconstructor as a tree regular expression. When doing so, you can use\nthe same syntax as usual, but where at the recursive positions you write\nregular expressions again.\n\nLet's make it clearer with an example. In our initial definition we had\na constructor `Branch'` with type:\n```haskell\nBranch' :: Int -\u003e f -\u003e f -\u003e Tree' f\n```\nOnce you inject it using `inj`, the resulting constructor becomes:\n```haskell\ninj . Branch' :: Int -\u003e Regex' c f -\u003e Regex c' f -\u003e Tree' (Regex' c f)\n```\nNotice that fields with no `f` do not change their type. Now, here is how\nyou would represent a tree whose top node is a 2:\n```haskell\ntopTwo = inj (Branch' 2 any_ any_)\n```\n\nIn some cases, you don't want to check the value of a certain position\nwhich is not recursive. In that case, you cannot use `any_`, since we\nare not talking about the type being built upon. For that case, you\nmay use the special value `__`.\n\nFor example, here is how you would represent the shape of a tree which\nhas at least one branch point:\n```haskell\nsomeBranching = inj (Branch' __ any_ any_)\n```\n\n#### Iteration and concatenation\n\nIteration in tree regular expressions is not as easy as in word languages.\nThe reason is that iteration may occur several times, and in different\npositions. For that reason, we need to introduce the notion of *hole*: a\nhole is a placeholder where iteration takes place.\n\nIn `t-regex` hole names are represented as lambda-bound variables. Then,\nyou can use any of the functions `square`, `var` or `#` to indicate a\nposition where the placeholder should be put. Iteration is then indicated\nby a call to `iter` or its post-fix version `^*`.\n\nThe following two are ways to indicate a `Tree` where all internal nodes\ninclude the number `2`:\n```haskell\n{-# LANGUAGE PostfixOperators #-}\n\nregex1 = Regex $\n           iter $ \\k -\u003e\n                  inj (Branch' 2 (square k) (square k))\n             \u003c||\u003e inj Leaf'\n\nregex2 = Regex $ ( (\\k -\u003e inj (Branch' 2 (k#) (k#)) \u003c||\u003e inj Leaf')^* )\n```\nNotice that the use of `PostfixOperators` enables a much terse language.\n\nIteration is an instance of a more general construction called *concatenation*,\nwhere a hole in an expression is filled by another given expression. The\ngeneral shape of those are:\n```haskell\n(\\k -\u003e ... (k#) ...) \u003c.\u003e r  -- k is replaced by r in the first expression\n```\n\n## Matching and capturing\n\nYou can check whether a term `t` is matched by a regular expression `r`\nby simply using:\n```haskell\nr `matches` t\n```\nHowever, the full power of tree regular expression come at the moment you\nstart *capturing* subterms of your tree. A capture group is introduced\nin a expression by either `capture` or `\u003c\u003c-`, and all of them appearing\nin a expression should be from the same type. For example, we can refine\nthe previous `regex2` to capture leaves:\n```haskell\nregex2 = Regex $ ( (\\k -\u003e inj (Branch' 2 (k#) (k#)) \u003c||\u003e \"leaves\" \u003c\u003c- inj Leaf')^* )\n```\n\nTo check whether a term matches a expression and capture the subterms,\nyou need to call `match`. The result is of type `Maybe (Map c (m (Fix f)))`.\nLet's see what it means:\n\n  * The outermost `Maybe` indicates whether the match is successful\n    or not. A value of `Nothing` indicates that the given term does\n\tnot match the regular expression,\n  * Capture groups are returned as a `Map`, whose keys are those\n    given at `capture` or `\u003c\u003c-`. For that reason, you need capture\n\tidentifiers to be instances of `Ord`,\n  * Finally, you have a choice about how to save the found subterms,\n    given by the `Alternative m`. Most of the time, you will make\n\t`m = []`, which means that all matches of the group will be\n\treturned as a list. Other option is using `m = Maybe`, where\n\tonly the first match is returned.\n\n#### Tree regular expression patterns\n\nCapturing is quite simple, but comes with a problem: it becomes easy to\nmistake the name of a capture group, so your code becomes quite brittle.\nFor that reason, `t-regex` includes matchers which you can use at the\nsame positions where pattern matching is usually done, and which take\ncare of linking capture groups to variable names, making it impossible\nto mistake them.\n\nTo use them, you need to import `Data.Regex.TH`. Then, a quasi-quoter\nnamed `rx` is available to you. Here is an example:\n```haskell\n{-# LANGUAGE QuasiQuotes #-}\n\nexample :: Tree -\u003e [Tree]\nexample [rx| (\\k -\u003e inj (Branch' 2 (k#) (k#)) \u003c||\u003e leaves \u003c\u003c- inj Leaf')^* |]\n           = leaves\nexample _  = []\n```\nThe name of the capture group, `leaves`, is now available in the body\nof the `example` function. There is no need to look inside maps, this\nis all taken care for you.\n\nNote that when using the `rx` quasi-quoter, you have no choice about\nthe `Alternative m` to use when matching. Instead, you always get as\nvalue of each capture group a list of terms.\n\nFor those who don't like using quasi-quotation, `t-regex` provides a\nless magical version called `with`. In this case, you need to introduce\nthe variables in a explicit manner, and then pattern match on a tuple\nwrapped inside a `Maybe`. The previous example would be written as:\n```haskell\n{-# LANGUAGE ViewPatterns #-}\n\nexample :: Tree -\u003e [Tree]\nexample with (\\leaves -\u003e Regex $ iter $ \\k -\u003e inj (Branch' 2 (k#) (k#))\n                                              \u003c||\u003e leaves \u003c\u003c- inj Leaf' )\n             -\u003e Just leaves\n           = leaves\nexample _  = []\n```\nNotice that the pattern is always similar `with (\\v1 v2 ... -\u003e\nregular expression) -\u003e Just (v1,v2,...)`.\n\n## Random generation\n\nYou can use `t-regex` to generate random values of a type which satisfy a\ncertain tree regular expression. Of course, you might always generate\nrandom values and then check that they match the given expression, but\nthis is usually very costly and maybe even statistically impossible.\nInstead, you should use `arbitraryFromRegex`.\n\n```haskell\ninstance Arbitrary Tree where\n  arbitrary = frequency\n    [ (1, return Leaf)\n    , (5, Branch \u003c$\u003e arbitrary \u003c*\u003e arbitrary \u003c*\u003e arbitrary) ]\n\n\u003e arbitraryFromRegex regex2\n```\n\nNote that in the previous example we also gave an instance declaration\nfor `Arbitrary`. This class comes from the `QuickCheck` package, and\nis needed to generate unconstrained random values for the case in\nwhich `any_` is found.\n\nSometimes you may not be able or want to write such an instance. In that\ncase, you can use `arbitraryFromRegexAndGen`, which takes an additional\nargument from which `any_` values are generated.\n\n## Attribute grammars\n\n[Attribute grammars](https://www.haskell.org/haskellwiki/The_Monad.Reader/Issue4/Why_Attribute_Grammars_Matter)\nare a powerful way to perform computations over a term. The main\nidea is that each node in your term (when seen as a tree) is\ntraversed, and two sets of information are recorded at each point:\n\n  * *Inherited attributes* go from parent to children. When\n    describing a grammar, each node needs to specify the value\n\tof inherited attributes of all their children,\n  * *Synthesized attributes* flow in the other direction.\n    At each node, you need to described how to get the value\n\tof each synthesized attribute based on your inherited\n\tattributes and the synthesized attributes of children.\n\nThe most performant attribute grammar compilers, such as\n[UUAGC](http://foswiki.cs.uu.nl/foswiki/HUT/AttributeGrammarSystem)\nonly allow deciding which rule to apply on a node depending on\ntheir topmost constructor. With `t-regex` you can look as\ndeep as you want to take this decision (but of course, performance\nwill suffer if you do this very often).\n\nHere is an example of a grammar which computes a graphical\nrepresentation of a `Tree` plus its number of leaves:\n```haskell\ngrammar = [\n    rule $ \\l r -\u003e\n     inj (Branch' 2 (l \u003c\u003c- any_) (r \u003c\u003c- any_)) -\u003e\u003e do\n       (lText,lN) \u003c- use (at l . syn)\n       (rText,rN) \u003c- use (at r . syn)\n       this.syn._1 .= \"(\" ++ lText ++ \")-SPECIAL-(\" ++ rText ++ \")\"\n       this.syn._2 .= lN + rN\n  , rule $ \\l r -\u003e\n     inj (Branch' __ (l \u003c\u003c- any_) (r \u003c\u003c- any_)) -\u003e\u003e\u003e \\(Branch e _ _) -\u003e do\n\t   check $ e \u003e= 0\n       (lText,lN) \u003c- use (at l . syn)\n       (rText,rN) \u003c- use (at r . syn)\n       this.syn._1 .= \"(\" ++ lText ++ \")-\" ++ show e ++ \"-(\" ++ rText ++ \")\"\n       this.syn._2 .= lN + rN\n  , rule $ inj Leaf' -\u003e\u003e do\n       this.syn._1 .= \"leaf\"\n       this.syn._2 .= Sum 1\n  ]\n```\nLet's dissect it part by part.\n\nFirst of all, a grammar is made of a series of *rules*. Each rule\nfollows the same schema:\n```haskell\nrule $ \\v1 ... vn -\u003e\n  regular expression -\u003e\u003e do\n    actions\n```\n`rule` is the constant part which prefixes every rule. Then, you\nhave the set of variables which will be used to capture information\nfrom the term, in a similar way to previous section. After that you\nhave the tree regular expression the term needs to match. Finally,\nand separated by `-\u003e\u003e`, you find the actions to perform when this\nrule is selected.\n\nTwo small extensions are shown in the second rule. By default,\n`-\u003e\u003e` does not give you access to the matched term. However, if you\nneed to access some of its information (for example, because you\nused `__`, as in this case), you can use the alternative version\n`-\u003e\u003e\u003e` which gives this as an argument. The second extension is the\nuse of `check` to pinpoint a logical condition which is not captured\nby the regular expression itself.\n\nThe syntax for the actions relies heavily in lenses and operators\nfrom the `lens` package. In particular, you have four lenses:\n\n  * `this.inh` gives access to the inherited attributes of the node,\n  * `at n . syn` gives access to the synthesized attributes of children,\n  * `this.syn` is where you set the synthesized attributes of your node,\n  * `at n . inh` is where you set the inherited attributes of children,\n\nFurthermore, you can combine those with lenses over your inherited\nand synthesized attribute data type to have more lightweight syntax.\nIn our case, the synthesized attributes are a tuple `(String, Sum Int)`,\nso we access them with `_1` and `_2`, as shown in the example.\n\nThe general rule is that you read values using `use`, and set values\nvia `.=`. If some value is not set, it defaults to the empty element\nof your synthesized type, or to the value of parent node in the case\nof inherited attributes.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserras%2Ft-regex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fserras%2Ft-regex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserras%2Ft-regex/lists"}