{"id":33237091,"url":"https://github.com/dtenny/rexxparse","last_synced_at":"2025-12-16T02:01:52.088Z","repository":{"id":239978252,"uuid":"801155509","full_name":"dtenny/rexxparse","owner":"dtenny","description":"A string parsing tool inspired by the REXX PARSE construct.","archived":false,"fork":false,"pushed_at":"2024-05-16T03:02:21.000Z","size":40,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-06-09T15:11:22.382Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Common Lisp","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dtenny.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-15T17:44:24.000Z","updated_at":"2024-06-02T15:05:46.000Z","dependencies_parsed_at":"2024-05-16T06:23:06.100Z","dependency_job_id":null,"html_url":"https://github.com/dtenny/rexxparse","commit_stats":null,"previous_names":["dtenny/rexxparse"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dtenny/rexxparse","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtenny%2Frexxparse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtenny%2Frexxparse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtenny%2Frexxparse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtenny%2Frexxparse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dtenny","download_url":"https://codeload.github.com/dtenny/rexxparse/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtenny%2Frexxparse/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27758422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-16T02:00:10.477Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-11-16T19:00:28.321Z","updated_at":"2025-12-16T02:01:51.997Z","avatar_url":"https://github.com/dtenny.png","language":"Common Lisp","readme":"# TL;DR Purpose\n\nA DSL to concisely scan/tokenize, extract, and transform semi-structured string data, and\nbind the results to variables. Inspired by the REXX PARSE command.\n\nSome simple if not particularly inspired examples:\n\n    (parse \"The quick brown fox\" (_ _ color animal)\n      (format t \"The color of the ~a is ~a~%\" animal color))\n\n    =\u003e The color of the fox is brown\n\n    (defvar *log-line* \"2024-Aug-12: [ERROR] Some stupid log error\")\n    (parse *log-line* (year \"-\" \"[\" severity \"] \" rest)\n      (when (string= severity \"ERROR\")\n        (format t \"WARNING WILL ROBINSON! ~s happened in ~a~%\" rest year)))\n\n    =\u003e WARNING WILL ROBINSON! \"Some stupid log error\" happened in 2024\n\n    (parse \"Meal total: $23.12\" (\"$\" (float dollars))\n        (format t \"Amount with 15 \u0026 20 percent tips: $~,2f, $~,2f~%\"\n            (* dollars 1.15) (* dollars 1.20)))\n\n    =\u003e Amount with 15 \u0026 20 percent tips: $26.59, $27.74\n\n# About\n\nA long time ago there was a novel scripting language that ran on the IBM\nVM/CMS operating system known as [REXX](https://www.rexxla.org/).\n\nOne of the things I always liked about REXX in its era of pre-regexp\nscripting languages was the `PARSE` statement. In its simplest form PARSE\nis a very nice way to parse strings with delimited or positional data and\nthen bind the matching substrings to variables.\n\nThis package attempts to reproduce REXX's `PARSE` statement as a Common Lisp DSL.\n\nThere's a bit of a zen thing to `PARSE`. Pattern matching is almost\nthe opposite of a regexp. Instead of specifying patterns for what you want\nto match, you specify patterns for the bits that are not of interest, and\nwhat gets bound as a match is the stuff in-between those uninteresting bits.\n\nFor example `(parse \"12:00:00\" (hh \":\" mm \":\" ss)) =\u003e (\"12\" \"00\" \"00\"))`\nis matching for the colons, the other tokens are just variable names to\nreceive the text matched _around_ the tokens (though this example doesn't\nshow the variables in use, `PARSE` defaults to returning a list of the matched\nvariables if there is no body).\n\n`PARSE` is about scanning strings and binding desired subsequences\nto variables in the style of REXX.  It is not intended for lisp syntax\nparsing, nor will it replace regexps where you need need to express complex\npatterns. It is better suited for word-splitting and fixed-format data.\n\nThat said `PARSE` can sometimes express some parsing tasks in a clearer and\nshorter way, and has other useful capabilities such as its ability to\nact like a programmable tape reader (e.g. reading a length descriptor\nfrom the input and then extracting a substring for the specified length).\n\nRealistically this package probably adds little to what you can piece\ntogether with other lisp parsing and/or pattern-matching packages, but if\nyou liked REXX, then perhaps you'll like this macro with its style of\nbinding and pattern specifications. This package is also regexp-free by\ndesign. Some overlapping regexp capabilities are mentioned below.\n\n# Tested platforms\n\nTests ran on the following without issues except as otherwise noted.\n\n* SBCL\n* ECL\n* CCL\n* ACL\n* LISPWORKS\n* ABCL - with the following mild warning about the `parse-float` packages\n  when loading, but otherwise okay:\n\n    ; Loading \"rexxparse-test\"\n    ; Caught BAD-SYSTEM-NAME:\n    ;   System definition file #P\"/home/dave/quicklisp/dists/quicklisp/software/parse-float-20200218-git/parse-float.asd\" contains definition for system \"parse-float-tests\". Please only define \"parse-float\" and secondary systems with a name starting with \"parse-float/\" (e.g. \"parse-float/test\") in that file.\n    ; Compilation unit finished\n    ;   Caught 1 WARNING condition\n\n\n# REXX Compatibility\n\nI have endeavored to make the basic string and position parsing compatible\nwith REXX semantics. So on the very slim chance you're a former REXX\nprogrammer using Lisp, hopefully you will feel at home.\n\nThis was also the hardest part of the project because the REXX semantics\nare sometimes subtle. I was never particularly knowledgeable of REXX to\nbegin with, and the REXX documentation is a bit hit-or-miss on some\ndetails.  At times I was guessing at black box behavior. I've tried to boil\ndown the main rules in a section labled \"Parse Rules 101\" below.\n\nAmong the REXX compatibility features of REXXPARSE is tolerance of edge\ncases like rebinding the same variable multiple times, position patterns\nwhich are out of bounds of the string, and so on.  About the only\nrestriction is that relative position fixnums must not be negative, which\nis in keeping with REXX semantics. Otherwise it tries not to complain about\nmundane things in your templates.\n\nIf you think you've found a bug, try your PARSE with Open Object REXX and\nsee what it does. My goal is to match its semantics for any functionality\nshared between the two, however note that the Lisp version has additional\ncapabilities which can't be compared.\n\n# Alternative text parsing packages\n\nLisp has plenty of great tools that already do parsing, here's a couple for\nconsideration.\n\n## cl-ppcre\n\nIf regexps are your thing you could also use the \n[cl-ppcre](https://edicl.github.io/cl-ppcre/#register-groups-bind)\n`register-groups-bind` construct.  It probably performs just\nas well (or better with its years of fine tuning, I have no idea). It even has its own\nflavor of transforms that can be applied to the match before binding.\n\n## scanfcl\n\nThere's also the Common Lisp [scanf](https://github.com/splittist/scanfcl)\ntool, which provides a lisp equivalent to the C `scanf` family of functions\nand has the ability to parse numbers for you, but does not provide bindings\nand suffers from broader limitations of `scanf`'s parsing capabilities.\n\n# Example comparison of regexp/scanf/PARSE\n\nHere is an example of parsing a simple text string with regexps and/or\n`scanf`, followed by the way parsing is done with `PARSE`.\n\nLet's use this text string that we want to parse, where we want to \ntease out the year/month/day and error message components:\n\n    (defvar *text* \n      \"2024/02/23 17:35:42.022 -  unable to locate '/usr/local/examples/' directory\")\n\nNote the additional blank space after the hyphen as well.\n\n## Using cl-ppcre `register-groups-bind`\n\n    (cl-ppcre:register-groups-bind (year month day error-msg)\n        (\"(\\\\d+)/(\\\\d+)/(\\\\d+).* -  (.*)\" *text*)\n      (list year month day error-msg))\n\n    =\u003e (\"2024\" \"02\" \"23\" \"unable to locate '/usr/local/examples/' directory\")\n\nNice enough, with the usual cross-eyed issues of writing regexps.\n\n## Using `scanf`\n\n    (scanfcl:sscanf *text* \"%d/%d/%d %*s -  %s\")\n\n    =\u003e (2024 2 23 \"unable\") \n\nScanf is nice because it will convert matched text to numeric types,\n`REXXPARSE:PARSE` can do that as well via transforms, a REXXPARSE extension\nto basic REXX capabilities.\n\nNote that the scanf example is able to suppress scanning of some text with\nthe '*' modifier, but fails to parse the message that was desired with\nwhitespace content. You can use fixed width %s or %c if you could make\nassumptions about the width but not generally compatible with most service\nlog content.  If your scanf supports character sets, you could use that\ntoo. Still, it isn't super friendly for reading delimited substrings the\nway we do with PARSE.\n\n## Using `PARSE`\n\n### Pure REXX PARSE\n\nThe original (NOT LISP!) REXX syntax would be:\n\n    PARSE *text* year \"/\" month \"/\" day . \"-\" error\n\nIn the above statement, `*text*` is known as the source (to be matched),\nand the remainder of the statement is known as the \"template\".  In REXX,\nthe period was a placeholder, in lisp we use '_' (underscore) because periods\nhave different behavior with the Lisp reader.  In the above example, the\nperiod would match the timestamp text.\n\nThe template contains symbols naming variables to be bound, and strings to\nbe matched in the source text such that they delimit the text of interest\nto be bound.\n\n### Lisp-styled REXX PARSE\n\nThe general syntax of PARSE is\n\n    (parse \u003csource-string\u003e (\u003ctemplate-elements\u003e) \u003coptional-body\u003e)\n\nThe body allows for optional declarations of template variable symbols via\nan implicit enclosing `locally`. Normally they will be strings unless you\nare using transforms, but no such implicit declarations are made.\n\nHere is a simple text parse without a body. The underscore is as mentioned above:\n\n    (parse *text* (year \"/\" month \"/\" day _ \"-\" error))\n\nIf all you want to do is return a list of values bound, you can omit all forms \nafter the template and a list of bound values will be returned, so the above\nwould return\n\n    =\u003e (\"2024\" \"02\" \"23\" \"unable to locate '/usr/local/examples/' directory\")\n\nValues are returned in order of the variables specified in the template.\nText conceptually (but not physically) bound do the placeholder `_` \nis not included in the result.\n\nOne of the main points of PARSE is to lexically bind variables for you\nso you don't have to go and fetch them from a list with `destructuring-bind`\nor other tools. For example:\n\n    ;; mock snippet dealing with some error noted in *text*\n    (parse *text* (\"unable to locate '\" path \"' directory\")\n      (cerror \"Create the directory ~s and continue\"\n              \"The directory ~s did not exist\" \n              path))\n\n    =\u003e\n\n    The directory \"/usr/local/examples/\" did not exist\n       [Condition of type SIMPLE-ERROR]\n\n    Restarts:\n     0: [CONTINUE] Create the directory \"/usr/local/examples/\" and continue\n     1: [RETRY] Retry SLIME REPL evaluation request.\n     2: [*ABORT] Return to SLIME's top level.\n     3: [ABORT] abort thread (#\u003cTHREAD tid=88446 \"repl-thread\" RUNNING {10084300A3}\u003e)\n\n#### Template variables, bindings vs. assignment\n\nSymbols acting as variables in the template, except for '_', are _bindings_\nintroduced by `LET` and initialized with\n`REXXPARSE:*UNMATCHED-BINDING-VALUE*`.\n\nHowever depending on the use of the symbols in the template, they may\nundergo multiple assignments, either to text matched by the parse, or to\nthe result of transformations on the parsed text.\n\nThe '_' does not result in a binding, no `_` symbol is bound on the stack,\nany template matches for this symbol will not be extracted or saved to any variable.\n\n#### REXX variables vs. Lisp s-expressions\n\nIf you're reading REXX documentation (or otherwise familiar with it), such\nas [Open Object REXX Reference](https://rexxinfo.org/reference/articles/oorexxref.pdf),\nnote that the use of parenthesized forms is different between REXX and\nREXXPARSE:PARSE.  Where REXX would use a parenthesized expression to do a\nlanguage variable references, REXXPARSE uses parenthesized forms in templates for their\nsyntactic value beyond that, e.g. `(+ x)` to is a positional pattern to\nmove rightward `x` columns.  I imagine the confusion will only occur to\npeople who have been writing a lot of REXX recently.\n\n### Word-oriented tokenization\n\nThe basic behavior of PARSE favors matching tokens delimited by\nspaces. Absent specific patterns from you, the spaces around tokens bound\nto variables are discarded.  Thus\n\n    (parse \"Now  is the time\" (now is the-time))\n    =\u003e (\"now\" \"is\" \"the time\")\n\nNote the multiple spaces between \"Now\" and \"is\", all used to divide tokens\nmatched and discarded. This is different from a pattern indicating a space,\ne.g.\n\n    (parse \"Now  is the time\" (now \" \" is \" \" the-time))\n    =\u003e (\"now\" \"\" \"is the time\")\n\nHere 'is' is matched to the text between the point matched by the pattern\non the left and the point matched by the pattern on the right. The two\npatterns match consecutive spaces and produce the a zero length binding.\nDon't let it mess with your head too much, this is a fairly contrived example.\n\n### More text than bindings\n\nThe last binding variable will be assigned any unmatched tail of the source\nstring.  E.g.\n\n    (parse \"a b c\" (a b))\n    =\u003e (\"a\" \"b c\")\n\nIn this situation, the text bound to the tail variable will not have spaces trimmed.\n\n### More bindings than text\n\nIf there are unused variables because there are fewer words in the\nsource than there are variables in the template, unused variables will\nbe bound to `REXXPARSE:*UNMATCHED-BINDING-VALUE*`, which defaults to an\nempty string (in keeping with REXX semantics).  You can change this\nbehavior by rebinding the variable.\n\n    (parse \"a b\" (a b c))\n    =\u003e (\"a\" \"b\" \"\")\n\n### Consecutive bindings and/or patterns\n\nYour template may have binding sequences without interleaved patterns, in which case\nthe implicit word splitting pattern applies.  It may also have pattern\nsequences without interleaved binding variables, which may be useful if,\nfor example, you're looking to advance across like tokens, e.g.\n\n    (parse \"I want the text following the second occurrence of 'text', this text.\"\n            (\"text\" \"text\" the-rest))\n    =\u003e (\"', this text.\")\n\n### Parse Rules 101\n\nThe simplest form of parsing template consists of a list of variable names.\nThe string being parsed is split up into words (characters delimited by\nblanks), and each word from the string is assigned to a variable in\nsequence from left to right. Leading blanks are removed from each word in\nthe string before it is assigned to a variable, as is the blank that\ndelimits the end of the word.\n\nBeyond the simple case there are some rules to remember for the myriad\nedge cases and features related to PARSE:\n\n1. If there is one variable and no pattern, the variable matches the whole\n   source string (no whitespace characters are removed).\n\n2. If there are more variables than words, excess varables are bound to\n   `*UNMATCHED-BINDING-VALUE*`. \n\n3. If there is more text than variables would match, the last variable is\n   bound to all remaining text.  Sometimes called the \"tail match\" rule.\n   Tail matches never eat spaces, they preserve the remainder of the source\n   string to be matched.\n\n4. [SUBTLE, CRUCIAL] Any explicit pattern (with a match in the source\n   string) creates a logical break in the\n   source string such the var to the left of the pattern is treated as a\n   \"tail match\" situation on the substring terminated by the pattern.\n\n   Moreover, variables to the left apply to the substring to the left\n   of the pattern.  I.e.\n\n   `(parse \"a b c x g\" (a b \"x\" g)) =\u003e (\"a\" \" b c\" \" g\")`\n\n5. Where no pattern is given between two variables or between a variable\n   and the beginning or start of the source string, an implicit \"word\n   splitting\" takes place.  Word splitting eats spaces before a token to be\n   matched, and one space after the token.\n\n6. An empty string is never found, it always matches the end of the source\n   string.  Specifying an absolute position of 1 as the pattern following a \n   variable has a similar effect as an empty string pattern, it leaves \n   the cursor positioned such that you can match source string again.\n\n   `(parse \" a b c \" (a \"\" b)) =\u003e (\" a b c \" \"\")`\n   `(parse \" a b c \" (a 1 b)) =\u003e (\" a b c \" \" a b c \")`\n\n7. Absolute positions less than one are treated as one.\n   Absolute positions greater than the source string length are treated\n   as being the string length.\n\n8. Relative position expressions, e.g. `(- \u003cexp\u003e)` require the `\u003cexp\u003e`\n   to be a non-negative fixnum or a string that can be converted to a\n   non-negative fixnum.  Absolute positions expressed with `(= \u003cexp\u003e)` have\n   the same rules.\n\n9. Relative and absolute positional patterns are interchangeable _with one exception_.\n\n   Normally template parsing of string matches skips the text matched by\n   the string pattern.  However when a template sequence of the form\n   `\"string\" variable \u003crelative-positional\u003e` does NOT skip the string\n   pattern data when assigning to the variable, and so the pattern text\n   will will appear in the variable.  For example (with non-relative examples too):\n   \n   `(parse \" a b c \" (\"b\" b)) =\u003e (\" c \"))`        ; \"b\" not included, no relative positional\n\n    ;; '5' is effectively equal to the source scanning start position when its pseudo-pattern\n    ;; is matched, which means an empty string match, which is a break/tail-position behavior.\n   `(parse \" a b c \" (\"b\" b 5)) =\u003e (\" c \"))`      ; from end of \"b\" to 5 is empty, full break\n   `(parse \" a b c \" (\"b\" b 7)) =\u003e (\" c\"))`       ; space past \"b\" to 7, 2 chars\n   `(parse \" a b c \" (\"b\" b (+ 1))) =\u003e (\"b\"))`    ; from position of b to position+1\n   `(parse \" a b c \" (\"b\" b (- 1))) =\u003e (\"b c \"))` ; from position of b to end of string\n\n10. Template expressions may reference variables bound by preceding\n    template matches. See the section on `Length Positional Patterns` for an example.\n\n### Positional template directives\n\nPatterns may also be positional directives, where integers specify absolute\nor relative positions in the source string, relative positions being\nrelative to the start of the last pattern matched. Positions are generally\nused for fixed length subfields in strings, but can also be used to re-scan\nthe source.\n\nLike string patterns, positions identify points at which the source string\nis split, only the length of the match is zero.  Also like string patterns,\nvariables bracketed by patterns will not be string trimmed.\n\n    ;                1         2         3\n    ;       1234567890123456789012345678901234\n    (parse \"Brimfield    Massachusetts   10101\"\n      (city 14 state 30 zip))\n    =\u003e (\"Brimfield    \" \"Massachusetts   \" \"10101\")\n\n_Absolute_ positions may be specified as positive integer literals.\nThe above example specifies position matches for columns 14 and 30.\nAbsolute positions are all 1-based integer values, i.e. a column ordinal. Subtract\none mentally for Lisp array indices. (This choice is for REXX compatibility).\n\nFor positions involving the integer-valued variables\ninstead of integer literals, you must supply an s-expression whose car is\none of `+`, `-`, `=`, followed by a s-expression that is evaluated at\nruntime (not macroexpansion time) to produce an integer to be interpreted\nas the relative or absolute position. The `+` and `-` expressions indicate\n_relative_ positions, while `=` indicates an absolute position.\n\nExamples:\n\n    ;; City occupies columns 1-13 inclusive.\n    ;; State occupies columns 14-29 inclusive\n    ;; '+' indicates position relative to the prior pattern match position.\n    (parse \"Brimfield    Massachusetts   10101\"\n      (city (+ 13) state (+ 16) zip))\n    =\u003e (\"Brimfield    \" \"Massachusetts   \" \"10101\")\n\n    ;; Mixing absolute positions 30 and 31 with relative offsets.\n    ;; reparsing the first '1' twice\n    (parse \"Brimfield    Massachusetts   10101\"\n      (30 one-a 31 (- 1) one-b (+ 1)))\n    =\u003e (\"1\" \"1\")\n\n    ;; use of variables must be through the parenthesized expression\n    ;; otherwise they would be indistinguishable from variables to be bound.\n    ;; '=' indicates absolute positions\n    (defvar *state-column* 14)\n    (defvar *zip-column* 30)\n    (parse \"Brimfield    Massachusetts   10101\"\n      (city (= *state-column*) state (= *zip*-column*) zip))\n    =\u003e (\"Brimfield    \" \"Massachusetts   \" \"10101\")\n\n\nAny positional directive that would precede the first source column\n(i.e. are \u003c 1) are treated as 1.\n\nAny positional directive that would exceed the length of the\nsource string is treated as the string length, matching the\nremainder of the string.\n\nIt is an error for any net position value to exceed the range of a fixnum.\n\n### Positional pattern data types \n\nAll positional expressions must be integers in the range of non-negative\nfixnums, or s-expressions that resolve to those values. This constraint is\nrelaxed for `+`, `-`, and `=` patterns, as well as `\u003e` and `\u003c` (described\nbelow) so that strings matched while parsing may be used later in the\ntemplate as numeric positional directives. Note that such uses \nof the value bound at one step of the parse act as input controlling later\nsteps of the parse.\n\nAllowing strings as positional values is a shortcut to avoid the need that\nfor littering your template with `parse-integer` calls on previously\nmatched text.  String to fixnum conversions in positional templates that do\nnot resolve to non-negative fixnums will result in an continuable error\nbeing signalled. Conversions are performed with `cl:parse-integer` and may\ngenerate a `cl:parse-error` condition if the text is not not parseable as\nan integer, and `parse-error` is not a continuable condition.\n\nSee the next section with examples matching integer data in the source\nstring and using those integers for subsequent match activity.\n\n### Length Positional Patterns\n\nA `length positional pattern` is a number in a `\u003c` or `\u003e` pattern sexp\nsimilar to the `+`, `-`, and `=` pattern forms. I'm not sure why REXX\ndistinguishes this from `-` and `+` positional patterns, they are identical\nin behavior except for one situation noted below.\n\nAs with `-` and `+` the number specifies the length at which the source\nstring is to be split relative to the current position. `\u003e` and `\u003c`\nindicates movement right or left, respectively from the start of the string\nor from the position of the last match.\n\nThe `\u003e` length pattern and the `+` relative positional pattern are\ninterchangeable except in the special case of a zero value. A `(\u003e 0)` pattern\nwill split the string into a null (empty) string and leave the match position\nunchanged, whereas a `(+ 0)` pattern also leaves the match position\nunchanged, but doesn't split the string.  In essence `(\u003e 0)` says \"match\nempty string\" whereas `(+ 0)` advance scan zero characters, matching\nwhatever follows.\n\nThis string splitting behavior is useful for parsing string subfields\nwhose lengths are also encoded in the string.\n\nThe following example shows the difference between `(\u003e 0)` and `(+ 0)`,\nnote the different matches for `middle`:\n\n     ;; Parsing with length patterns\n     (parse \"04Mark0005Twain\" \n            (len (+ 2) first (\u003e len) len (+ 2) middle (\u003e len) len (+ 2) last (\u003e len))\n        (list first middle last len))\n     =\u003e (\"Mark\" \"\" \"Twain\" \"05\")\n\n     ;; Parsing with relative patterns only\n     (parse \"04Mark0005Twain\" \n            (len (+ 2) first (+ len) len (+ 2) middle (+ len) len (+ 2) last (+ len))\n        (list first middle last len))\n    =\u003e (\"Mark\" \"05Twain\" \"Twain\" \"05\")\n\nWhile `\u003c` is similar to `-`, application of of the match/extract process\ndiffers.  To achieve the effect of `\u003c` on a region of text with `-` you\nmust use a `-`/`+` pair, and the position in source differs as in the\nfollowing example:\n\n    ;; Parsing with length patterns\n    (parse \"12345.6789\" (\".\" digit (\u003c 1) rest)) =\u003e (\"5\" \"5.6789\")\n    ;; Parsing with relative patterns\n    (parse \"12345.6789\" (\".\" (- 1) digit (+ 1) rest)) =\u003e (\"5\" \".6789\")\n\n`\u003c` is similar to matching a string literal, _without_ advancing the next\nposition to be scanned after binding.\n\n### Transformations (REXXPARSE extension)\n\n`REXXPARSE::PARSE` supports transformations on matched strings before they\nare assigned to variables.  Transforms are a REXXPARSE lisp extension and\nnot part of the basic REXX PARSE capability.\n\nThe syntax for an assignmement based on a predefined transformation is:\n\n    (\u003ctransform\u003e variable)\n\nwhere you would otherwise just have a variable to be bound that wasn't in a\nlist.  Note that the above uses `\u003ctransform\u003e` as a non-terminal BNF token\nrepresenting many possible pre-defined transformations. There is also a `TRANSFORM`\nterminal symbol with specific user-defined transformation semantics.\n\nTransforms have the same syntax as list-form patterns but are in\nfact binding forms. PARSE distinguishes patterns from transform-augmented\nbindings by the symbol name of the CAR of the list being known as a\ntransform symbol.\n\nTransforms are a convenience for common parse situations, you\ncould always do the transformations in the `\u0026BODY` of the parse if you need\ndifferent transformation semantics than those pre-defined by REXXPARSE or\nsimply don't like the confusion transform syntax that resembles pattern syntax.\n\n    (parse \"some text with numbers: 1.0 2\" (_ \": \" (float x) (integer n))\n      (format t \"~f is a ~s, ~d is a ~s~%\" x (type-of x) n (type-of n)))\n\n    =\u003e\n    1.0 is a SINGLE-FLOAT, 2 is a (INTEGER 0 4611686018427387903)\n    NIL\n\nThe `(float x)` and `(integer n)` expressions are DSL syntax to invoke transformations\non the text corresponding to variables `x` and `n`, and assigning the\ntransformation result to those variables.  The set supported\ntransformations are describe below.\n\nTransformations do not currently nest, i.e. you _cannot_ do `(LOWER (KEBAB x))`\nif you need to apply more complicated transformations, see 'user defined transforms' below.\n\nTransforms expressions using the `_` symbol are effectively NO-OPs.  No\ntext is extracted, and no transform function is run.\n\n#### Pre-defined transforms\n\n* UPPER       - uppercases the extracted text.\n* LOWER       - lowercases the extracted text.\n* SNAKE       - convert hyphens to underscores.\n* KEBAB       - convert underscores to hyphens.\n* LTRIM       - remove leading spaces.\n* RTRIM       - remove trailing spaces.\n* TRIM        - remove leading and trailing spaces.\n* INTEGER     - convert extracted text to an integer.\n* FLOAT       - convert extracted text to a single-float.\n* DOUBLE      - convert extracted text to a double-float.\n* KEYWORD     - convert extracted text to a keyword.\n\nThe floating point conversions are done using the `:parse-float` package,\nthey do _not_ perform unsafe `READ`s. INTEGER conversion is done using `PARSE-INTEGER`.\n\n`SINGLE-FLOAT` and `LONG-FLOAT` conversions are not supported, the\n`:parse-float` package doesn't seem to support them on SBCL at least. If\nyou need these representations you'll probably want to use\n`*READ-DEFAULT-FLOAT-FORMAT*` bindings with a user-defined transform that\nobserves it and manages the conversion.\n\nOf course you could also just supply a BODY to the `PARSE` form and do the\nconversions in the body, there's no need whatsoever to use user-defined\ntransforms except perhaps to abbreviate code if the transformation is used a lot.\n\nThe `LTRIM`, `RTRIM`, and `TRIM` transforms use the Common Lisp `STRING-LEFT-TRIM`,\n`STRING-RIGHT-TRIM`, and `STRING-TRIM` functions respectively, supplying\n`REXXPARSE:*TRIM-CHARACTER-BAG*` as the character bag argument. This is\nexported so that you may bind it to other characters (outside of the\n`PARSE` form), but note that it will\naffect all trim transforms in the scope of the binding.\n\nThe KEYWORD transform does no conversions to case, so it's easy to make\nsymbols in unexpected cases if you aren't careful. If you want to\nupper/lower case the text before the transform makes a keyword of it, you\ncould use `:UPPER` or `:LOWER` options to `PARSE` (though that will change\nthe case of the whole source string).  Or you can just do what you want in\nthe `PARSE` body.\n\n#### User defined transforms\n\nThere is a special transform operator, `TRANSFORM`, which exists to invoke\nuser-supplied transformation functions.\n\n    (transform \u003csymbol\u003e \u003cfunction\u003e)\n\nWill invoke `function` on the text extracted by the parse, and assign the \nresult of the transformation function to `symbol`.  `function` must be a\n[function designator](https://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator)\nfor a function of one argument which will always be a string. \n\n    ;; User defined transform example\n    (defun stupid (str) \"Stupid!\")\n    (parse \"Don't call me dull.\" (_ _ _ (transform s 'stupid) \".\"))\n    =\u003e (\"Stupid!\")\n\n## Full PARSE syntax\n\nThe general form of a `PARSE` invocation (pardon the weak BNF) is below.\nOnly the `\u003csource\u003e` expression is required, and it must yield a string.\n\n    PARSE \u003coptions\u003e \u003csource\u003e \u003ctemplate\u003e\n\n    \u003coptions\u003e ::=\n    \u003coptions\u003e ::= :UPPER\n    \u003coptions\u003e ::= :LOWER\n    \u003coptions\u003e ::= :CASELESS\n    \u003coptions\u003e ::= :USING (\u003cvar\u003e ...)\n    \u003coptions\u003e ::= (:USING \u003cvar\u003e ...)\n    \u003coptions\u003e ::= :USING-VECTOR (\u003cvar\u003e ...)\n    \u003coptions\u003e ::= (:USING-VECTOR \u003cvar\u003e ...)\n\n    \u003csource\u003e ::= string-literal\n    \u003csource\u003e ::= s-exp\n\n    \u003ctemplate\u003e ::= \n    \u003ctemplate\u003e ::= \u003ctemplate-expression\u003e\n\n    \u003ctemplate-token\u003e ::= \u003cbinding\u003e\n    \u003ctemplate-token\u003e ::= \u003cpattern\u003e\n\n    \u003ctemplate-expression\u003e ::= \u003ctemplate-token\u003e\n    \u003ctemplate-expression\u003e ::= \u003ctemplate-expression\u003e \u003ctemplate-token\u003e\n\n    \u003cpattern\u003e ::= string-literal\n    \u003cpattern\u003e ::= \u003cposition\u003e\n    \u003cpattern\u003e ::= ( $ \u003cs-exp\u003e )\n\n    \u003cposition\u003e ::= position-integer-literal\n    \u003cposition\u003e ::= ( + \u003cposition-integer\u003e )\n    \u003cposition\u003e ::= ( - \u003cposition-integer\u003e )\n    \u003cposition\u003e ::= ( = \u003cposition-integer\u003e )\n    \n    \u003cposition-integer\u003e ::= position-integer-literal\n    \u003cposition-integer\u003e ::= \u003csexp\u003e\n\n    \u003cbinding\u003e ::= symbol\n    \u003cbinding\u003e ::= ( \u003ctransformation\u003e symbol )\n\n    \u003ctransformation\u003e ::= ( \u003cbuilt-in-transformation\u003e symbol )\n    \u003ctransformation\u003e ::= ( TRANSFORM symbol function )\n    \n    ;; String producing case(like) transformations\n    \u003cbuilt-in-transformation\u003e ::= UPPER\n    \u003cbuilt-in-transformation\u003e ::= LOWER\n    \u003cbuilt-in-transformation\u003e ::= SNAKE\n    \u003cbuilt-in-transformation\u003e ::= KEBAB\n\n    ;; Non-string producing transformations\n    \u003cbuilt-in-transformation\u003e ::= INTEGER\n    \u003cbuilt-in-transformation\u003e ::= DOUBLE\n    \u003cbuilt-in-transformation\u003e ::= FLOAT\n    \u003cbuilt-in-transformation\u003e ::= KEYWORD\n    \nSymbols are compared by name (so package doesn't matter), but upper case symbol names are expected.\n\n1. `\u003csource\u003e` is evaluated to produce a string, unless it is already a string.\n2. `\u003cpattern\u003e` is used to find the text region in `\u003csource\u003e` to be bound to\n   the symbol in `\u003cbinding\u003e`.\n3. Once a region of text is matched by a pattern, it is assigned to the\n   `\u003cbinding\u003e` symbol. If there was no match or the source text is\n   exhausted, the symbol is bound to `REXXPARSE:*UNMATCHED-BINDING-VALUE*`. \n   Do not mutate returned value. \n4. `\u003cbinding\u003e` can also be an sexp of the form of the form `(TRANSFORM function symbol)`, \n   in which case `function` is run to transform the matched text before assigning it to `symbol`.\n\n## `($ \u003cs-exp\u003e)`: variables (or other s-exps) as patterns\n\nPatterns of the form `($ \u003cs-exp\u003e)` are used to indicate that the expression\n`\u003cs-exp\u003e` is to be evaluated to produce a string to be used as a\npattern. It is needed because naked symbols in the template are interpreted\nas binding names, and so are not normally evaluated.  `$` causes them to be\nevaluated and treated as string-literal patterns, similar to a variable\nevaluation directive in various other languages.\n\nExample:\n\n    (let ((x \"brown\"))\n      (parse \"the quick brown fox\" (start ($ x) end)))\n    =\u003e (\"the quick \" \" fox\")\n\nThe `$` form is only needed to evaluate symbols in patterns that don't\notherwise evaluate them. Positional patterns directives such as `(+ x)`\nalready evaluate their arguments. `(+ ($ x))` is not only unecessary, it\nalso wouldn't work (as `$` is a pattern directive, not a function for arbitrary\ns-exp evaluation).\n\n# Differences from REXX' PARSE \n\n## The Lisp bits\n\nFirst of all, you're in lisp.  So PARSE is a DSL of a style similar to\nCommon Lisp's advanced LOOP macro.  If you don't like LOOP, you may not\nlike PARSE.\n\nSecond, there is an extensible mechanism you can use both to specify the\npatterns matched in the template, and _transformations_ on the bound\nvalues.\n\n## PARSE UPPER|LOWER|CASELESS =\u003e PARSE :UPPER|:LOWER|:CASELESS\n\nParse options like UPPER are specified as keywords and not plain symbols.\n\n## The source string can be any s-expression that yields a string.\n\nREXX required a lot of additional syntax to decalre how the string\nexpression was interpreted, e.g. 'PARSE VAR'.  REXXPARSE does not suffer\nfrom this, one source s-expression fits all so long as it yields a string.\n\nNIL is not permitted for the source string.\n\n## A missing template renders PARSE a NO-OP\n\nAs there is no `PARSE LINEIN` or `PARSE PULL` in this implementation,\nif there is no template the `PARSE` invocation is a NO-OP, or as close to\nit as we can make it.\n\nIf the template is missing or does not specify any variables other than\n`_`, the `\u003csource\u003e` expression is not evaluated.\n\n## There is no multi-string comma operator\n\nThe original REXX PARSE would bind multiple strings with a comma separating \nmultiple templates. This is not supported.\n\n## Positional Position Syntax\n\nIn REXX, a plus `(+)` or minus `(-)` before an integer indicated a\n_relative_ position to be matched in the template. The presence of a `+`\nwas not the same as a positive value. A simple \"10\" indicates an absolute\nposition, whereas a \"+10\" indicates a relative position.\n\nTo do this in lisp requires that `+` be a separate token to survive the\nreader. So we could represent a relative plus position as `(+ 10)`, `+ 10`\nand so on.\n\nThen there's the evaluation aspect if you're using an expression\nvalue. E.g., you want to say `+myval` for some variable `myval`.  REXX \nwould render that as +(myval) according to its syntactic semantics.\n\nREXXPARSE:PARSE allows for the following for positional positions.\n\nFor shorthand positional patterns of constant values we allow integers and\nkeywords like these: `:+10`, `:-10`, `:=10`, `:10`. You can also use\nintegers, possibly negative, e.g. `10` and `-10`, but we cannot infer\nrelative positions on positive integers.\n\nThe long form syntax for positional patterns which might indicate\nrelativity as well as the value of arbitrary expressions is as follows\nfor some expression `x`, which may also be any number for which integer\ncoercion semantics are defined.\n\n    (+ x)\n    (- x)\n    (= x)\n\n## Additional PARSE options\n\nREXXPARSE:PARSE allows for a number of options that change various\nbehaviors of the parse. Options are specified as (optional) keywords that\nprecede the source string expression to be parsed, i.e.\n\n    (PARSE [:option1 ... :optionN] source-sexp (template) ...)\n\nSome options have accompanying values, some do not, and some may be expressed\nas lists.  All options, and only options, are triggered by keywords in the\nPARSE arguments.\n\nThe following sections describe the options.  I have attempted to attribute\noptions to the REXX language versions that introduced them, please\nprovide corrections if the attributions are incorrect.\n\nAll options specified are bound to `REXXPARSE:*OPTIONS*` for the scope of\nthe PARSE expression, so that user-extensible patterns or other operators\ncan check for options that might require consideration, such as :CASELESS\ncomparisons. \n\nThe traditional REXX options (:UPPER, :LOWER, :CASELESS) are plain\nkeywords that are _not accompanied by values_, their presence triggers the intended\nbehavior. Other options may accept values, refer to the documentation on\nindividual options for details.\n\n### PARSE :UPPER\n\n`PARSE :UPPER` converts lowercase a-z to uppercase before parsing. Note that\nthis represents a transformation (by copying) of the source string before parsing, \nand matched content will by definition be upper case as the source string\nwill no longer have any lowercase text.\n\n    (parse :upper \"A b C d\" (w \"C\" r)) =\u003e (\"A B \" \" D\")\n\nNote that specifying lower case string patterns will foil matching, \n:UPPER has no effect on the pattern text or the comparisons used.\n\n    (parse :upper \"A b C d\" (w \"c\" r)) =\u003e (\"A B C D\" \"\")\n\n`UPPER` was the only option in the original REXX PARSE construct.\n\nThe `:UPPER` option is mutually exclusive with the `:LOWER` option.\n\n### PARSE :LOWER\n\n`PARSE :LOWER` converts uppercase a-z to lowercase before parsing. Note that\nthis represents a transformation (by copying) of the source string before parsing, \nand matched content will by definition be lower case as the source string\nwill no longer have any uppercase text.\n\n    (parse :lower \"A b C d\" (w \"c\" r)) =\u003e (\"a b \" \" d\")\n\nNote that specifying upper case string patterns will foil matching, \n:LOWER has no effect on the pattern text or the comparisons used.\n\n    (parse :lower \"A b C d\" (w \"C\" r)) =\u003e (\"a b c d\" \"\")\n\nThe `LOWER` was added by the NetREXXX language specification.\n\nThe `:LOWER` option is mutually exclusive with the `:UPPER` option.\n\n### PARSE :CASELESS\n\n`PARSE :CASELESS` ignores case on the comparisons. Unlike :UPPER and other\noptions it does not transform the source string, but instead changes the\ncharacter equality predicates used for comparison.\n\n    (parse :caseless \"A b C d\" (w \"c\" r)) =\u003e (\"A b \" \" d\")\n\nThe `CASELESS` was added by the Open Object REXX language specification.\n\nNote that this option may not be supported by pattern processors used as\nextensions to the REXXPARSE behavior.  Extensions may implement the desired\nbehavior by examining the value of `REXXPARSE:*OPTIONS*` which is bound to\noptions specified in the PARSE form.\n\n### PARSE NUMERIC (unsupported)\n\nThe IBM z/OS version of REXX supports a `parse numeric digits form fuzz`\npackaging of the `numeric` operator.  This is not supported by REXXPARSE.\n\n### PARSE :USING (\u003cvar\u003e ...) or (:USING \u003cvar\u003e ...)\n\nThe `:USING` option indicates that for all symbols in the var list,\n`PARSE` should _not_ allocate bindings in its macroexpansion `LET` block,\nand should instead use vars which already exist in the environment.\nThis may be useful for iterative performance or other application logic reasons.\n\nWhen you request this behavior the vars do not undergo any initialization\nstep by `PARSE`.  If they are not assigned values by by the\nmatch/extract/assign steps (because there are more variables than matches)\nthen whatever value they had going into parse is the the\nvalue they will have in the body of `PARSE`.\n\nExample:\n\n    (let ((x nil)\n          (y t))\n      (parse :using (x y) \"abc\" (x)))\n    =\u003e (\"abc\")\n\nWith the side effect that X is now \"abc\", and Y, which was not matched\nor assigned, is still T.\n\n### PARSE :USING-VECTOR (\u003cvar\u003e ...) or (:USING-VECTOR \u003cvar\u003e ...)\n\nThe `:USING-VECTOR` option is similar to the `:USING` vector. Symbols in\nthe list will not be allocated or initialized in the macro-expansion.\n\nHowever in this case the symbols are expected to refer to fill-pointered\narrays when the PARSE macroexpansion is executed. Where an ordinary\n`:USING` symbol would be assigned with `SETQ` or `SETF`, assignments to\nsymbols named by `:USING-VECTOR` will be executed by `(VECTOR-PUSH \u003cvalue\u003e\n\u003cvar\u003e)`. The variable is typically reused for multiple template bindings.\n\nThe array must exist and be large enough to accept the new value(s), and you\nshould ensure the fill pointer is where you want it on entry to `PARSE`.\nIt is an error if symbols named in `:USING-VECTOR` are also specified in\n`:USING`.  Note that `VECTOR-PUSH` will not modify the array or signal a\ncondition if the fill-pointer indicates the array is full.\n\nAside from other possible performance or logic utility, the use of vectors\nenables you to ask how many template assignments were matched and executed\nby querying the fill-pointered vector length.  While you can query ordinary\n`PARSE` bound vars to see if they were not matched, use of `:USING-VECTOR`\nmay be a more efficient way to get a count if the input is unlikely to\nmatch bindings in a predetermined fashion.\n\nExample:\n\n    (let ((v (make-array 5 :fill-pointer 0)))\n      (parse :using-vector (v) \"abc def\" (v v v)) ;=\u003e (\"abc\" \"def\" \"\")\n      v)\n    =\u003e #(\"abc\" \"def\")\n\nVector symbols in the template may not be used for input in positional\ndirectives in the way that ordinary symbols are used. If you want to use a\nvalue matched and stored in a vector in a previous template binding, in a\npositional pattern you'll need to AREF your previously matched slot in the\npositional pattern s-exp.  I.e.\n\n    (let ((v (make-array 4 :fill-pointer 0)))\n      (parse (:using-vector v) \"04Mark\" (1 v (+ 2) v (\u003e (aref v 0)))\n        v))\n\n    =\u003e #(\"04\" \"Mark\")\n\nSome notes on return values when using vectors\n_when there is no `\u0026BODY`_ provided to the parse.\n\n1. `PARSE` normally returns a list of all symbols values\n   with binding specifications in the template, so it would normally return\n   `(\"abc\" \"def\" \"\")`.  However while there are three binding expressions\n   in the example template above there are only two _assignments_ (only two\n   matches) so the vector only receives two values.\n\n2. `PARSE` endeavors to return values (again, when there isn't a BODY),\n   such that the bound/assigned values are the same whether vectors or\n   non-vectors variables are used in the template.\n\n   This would be difficult with vector since we're neither initializing the\n   vector, nor certain that `(aref v \u003cn\u003e)` has any meaningful value, or is\n   even accessible if it wasn't assigned by the parse.  E.g.\n\n    `(parse :using-vector (v) \"a b\" (v v v)) =\u003e ???`\n\n   If you'd done (parse \"a b\" (a a a)) it would return `(\"\")`, the last\n   value matched for the only template binding symbol.\n   \n   For this reason, without a BODY specification we do extra setup for\n   vectors and create shadow symbols for the bindings in the template that\n   specify vectors. The shadow symbols are initialized and assigned like regular\n   variables, solely so that we have something to make the PARSE default\n   return value compatible with similar uses of non-vector template binding\n   variables.  Thus the above example would return (\"a\" \"b\" \"\"), just as if\n   you'd said (parse \"a b\" (a a a))\n\n   Similarly, (parse :using-vector (v) \"a b\" (v b)) =\u003e '(\"a\" \"b\")\n\n3. You may wish to return NIL (or some other non-default value) to avoid\n   `PARSE` consing a list as the default result (when there is no BODY)\n   when you already have the result in a vector or other previously\n   allocated bindings, presumably because you may be trying to avoid\n   consing when you specify :USING or :USING-VECTOR.\n\n   Of course this (NIL or other BODY return) also suppresses maintenance of\n   the vector shadow symbol discussed in item 2 as well.\n\n# User Extensible Behaviors\n\nThis section describes ways to extend REXXPARSE behavior by adding new\nscanners (pattern matchers), transformers, and options (*TBD* - maybe not options).\n\n# Future work\n\n## Cleanup and code improvements\n\nIt took me longer than expected to grok how REXX' PARSE command works, and\nmy implementation took many twists and turns as I went down that road of\ndiscovery.  The result is some code I don't like that could undoubtedly be\nstreamlined to express the rexx semantics better and work a bit faster.\nIn particular the scanners and the calling logic that decides what to do\nwith the scanner data, e.g. figuring out what is supposed to be extracted\nwith the semantics of '\u003e' vs '+'. \n\n## Regexp patterns\n\nThe base `REXXPARSE` capabilities emulate REXX, and by design do not\nincorporate regular expressions into the functionality.  However I was\nthinking it would be nice to also allow that if people want it, in a\nseparate ASDF lisp system that combines CL-PPCRE and REXXPARSE into a\n`REXXPARSE-RE` system, so that regexp string scanning is done via a\nuser-extensible interface to REXXPARSE. I.e. inaddition to plain string\nliteral patterns, you'd have regexp patterns as well. Note that this\nwouldn't change the inverted matching style of `PARSE`, it would just\naugment what could be matched.\n\nIt isn't clear that anybody will ever use REXXPARSE much less a\nhypothetical REXXPARSE-RE, so this is unlikely to appear without an\nindicator of interest.\n\n## Some real use of the extension mechansims\n\nThe main tentative extension mechansims now are `*OPTIONS*` and\n`*PATTERN-\u003eSCANNER*`.  I've chosen to use special variables and NOT generic\nfunctions so that in the unlikely event you had two users of REXXPARSE in\nthe same lisp system, they could both extend the behavior without\nclobbering each other, and generic functions would not support that.\n\nHowever I haven't actually tested this, so consider the extension\nmechansisms a work in progress until someone builds something like\n`REXXPARSE-RE` below, or other things, to battle test the extension logic.\nI.e. there may be breaking changes (only on the extension mechansims) until\nI know it's usable.\n","funding_links":[],"categories":["Interfaces to other package managers"],"sub_categories":["Third-party APIs"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdtenny%2Frexxparse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdtenny%2Frexxparse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdtenny%2Frexxparse/lists"}