{"id":17312234,"url":"https://github.com/deathking/slex","last_synced_at":"2026-01-06T09:05:44.585Z","repository":{"id":149095950,"uuid":"91145964","full_name":"DeathKing/SLeX","owner":"DeathKing","description":"Yet another not-so-simple lex analysor generator and naïve regular expression engine written in programming language Scheme.","archived":false,"fork":false,"pushed_at":"2017-05-18T14:57:48.000Z","size":340,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-01T06:27:45.697Z","etag":null,"topics":["automation","dfa","lex","lisp","scheme"],"latest_commit_sha":null,"homepage":"","language":"Scheme","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DeathKing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-05-13T03:16:18.000Z","updated_at":"2023-04-15T21:09:00.000Z","dependencies_parsed_at":"2023-04-07T08:05:00.006Z","dependency_job_id":null,"html_url":"https://github.com/DeathKing/SLeX","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeathKing%2FSLeX","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeathKing%2FSLeX/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeathKing%2FSLeX/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeathKing%2FSLeX/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DeathKing","download_url":"https://codeload.github.com/DeathKing/SLeX/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245761301,"owners_count":20667895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","dfa","lex","lisp","scheme"],"created_at":"2024-10-15T12:42:50.776Z","updated_at":"2026-01-06T09:05:44.547Z","avatar_url":"https://github.com/DeathKing.png","language":"Scheme","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg width=\"90%\" src=\"./assets/logo.png\" alt=\"SLeX Logo\"\u003e\n\u003c/p\u003e\n\nYet another not-so-simple lex analysor generator and naïve regular expression engine written in programming language Scheme.\n\n**Warning!** This project is still evolving, most of the API will be changed later. \n\n## Use SLeX as Regular expression engine\n\nA [regular expression](https://en.wikipedia.org/wiki/Regular_expression) usually is a sequence of characters that define a search pattern.\n\nIn usual programming language, Regular Expression is defined in form of String or literal. But for now in SLeX, you can only use RE-IR (Intermediate Representation) to define a RE (see [Roadmap](#roadmap)). In SLeX, we have:\n\n  + 1 primitive RE-IR constant:\n    1. `eps`: eat none of input characters.\n  + 6 primitive constructors: \n    1. `sig`, `sig*`: eat a single character in the given char-set.\n    2. `sig-co`, `sig*-co`: eat any single character which not in the given char-set.\n    3. `exact`: match exactly the given string (i.e. sequence of chars), case-sensitively.\n    4. `exact-ci`: match exactly the given string, but case-insensitively.\n + and another 5 RE-IR combinators:\n    1. `alt`\n    2. `seq`\n    3. `rep?`\n    4. `rep+`\n    5. `kln*`\n\n### primitives and combinators\n\nThe constant `eps` returns a ε-RE which matches whatever you gave:\n\n```scheme\n(define RE-eps (RE/compile eps))\n\n(RE/matches? RE-eps \"\")     ; ==\u003e #t\n(RE/matches? RE-eps \"asdf\") ; ==\u003e #t\n```\n\nGavin a char-set, `sig` construct a RE that matches exact one char in that candidate set. On the other hand, `sig*` receives multiple chars as arguments and packed them as a char-set.\n\n```scheme\n(define RE-digit (RE/compile (sig slex:digit)))\n(define RE-digit (RE/compile (sig* #\\0 #\\1 #\\2 #\\3 #\\4 #\\5 #\\6 #\\7 #\\8 #\\9)))\n\n(RE/matches? RE-digit \"1\")    ; ==\u003e #t\n(RE/matches? RE-digit \"a\")    ; ==\u003e #f\n(RE/matches? RE-digit \"12\")   ; ==\u003e #f\n```\n\nThere are some pre-defined char-sets that you can use:\n\n```scheme\n(define slex:digit       char-set:numeric)\n(define slex:upper-case  char-set:upper-case)\n(define slex:lower-case  char-set:lower-case)\n(define slex:alpha       char-set:alphabetic)\n(define slex:alphanum    char-set:alphanumeric)\n(define slex:whitespace  char-set:whitespace)\n(define slex:standard    char-set:standard)\n(define slex:graphic     char-set:graphic)\n```\n\n`exact` and `exact-ci` match exactly the given string, while `exact-ci` means case-insensitive:\n\n```scheme\n(define RE-switch0 (RE/compile (exact \"switch0\")))\n(define RE-switch0-ci (RE/compile (exact-ci \"switch0\")))\n\n(RE/matches? RE-switch0 \"switch0\")     ; ==\u003e #t\n(RE/matches? RE-switch0 \"SWITCH0\")     ; ==\u003e #f\n(RE/matches? RE-switch0-ci \"sWiTCh0\")  ; ==\u003e #t\n```\n\n`alt` makes an alternation of two or more REs:\n\n```scheme\n(define RE-peculiar-identifier (RE/compile (alt (sig* #\\+ #\\-) (exact \"...\")))\n\n(RE/matches? RE-peculiar-identifier \"+\")     ; ==\u003e #t\n(RE/matches? RE-peculiar-identifier \"...\")   ; ==\u003e #t\n(RE/matches? RE-peculiar-identifier \"+..\")   ; ==\u003e #f \n```\n\n`seq` connects two or more REs:\n\n```scheme\n; equivalent to (exact \"...\")\n(define RE-ldots (RE/compile (seq (sig* #\\.) (sig* #\\.) (sig* #\\.))))\n\n(RE/matches? RE-ldots \"...\")   ; ==\u003e #t\n(RE/matches? RE-ldots \".\")     ; ==\u003e #f\n(RE/matches? RE-ldots \"....\")  ; ==\u003e #f\n```\n\n`rep?` means that the RE should appear zero or exact one time:\n\n```scheme\n; positive-integer may have a plus sign prefix\n(define RE-positive-integer-revised\n        (RE/compile (seq (rep? (sig* #\\+)) positive-integer)))\n\n(RE/matches? RE-positive-integer-revised \"+012345\")   ; ===\u003e #t\n```\n\n`kln*` creates a so called **Kleene Closure** of a RE:\n\n```scheme\n; for simplicity, we suppose that integer could start with '0'\n(define RE-positive-integer (RE/compile (kln* slex:digit)))\n\n(RE/matches? RE-positive-integer \"123456\")   ; ===\u003e #t\n(RE/matches? RE-positive-integer \"012345\")   ; ===\u003e #t\n(RE/matches? RE-positive-integer \"0.1234\")   ; ===\u003e #f\n``` \n\n### use RE to match pattern\n\nif a RE r is equivalent to a DFA M, thus, a String s ∈ L(r) iff δ q0 s ∈ F or partial-δ q0 s \"\" = f s \"\".  \n\n### use RE to scan pattern\n\n`scan` function will return all the possible matches occures in text. \n\n### use RE to substitute pattern\n\n## Use SLeX to generate token analysor\n\nTo analysis the token stream, you should define a lexer with `define-lex` special form and then add definitions and rules to it. The `define-lex` special form has following forms:\n\n```scheme\n(define-lex \u003clex-id\u003e {\u003cdefinition-exps\u003e} \u003crule-exps\u003e)\n\n;;; where\n;;;   \u003cdefinition-exps\u003e ::= (\u003cdef-type\u003e \u003cdef-clause\u003e ... )\n;;;   \u003cdef-type\u003e    ::= definitions | definitions*\n;;;   \u003cdef-clause\u003e  ::= (\u003cidentifier\u003e \u003cscheme-exp\u003e)\n;;;\n;;;   \u003crule-exps\u003e   ::= (rule \u003crule-clause\u003e ... {\u003cdefault-exp\u003e})\n;;;   \u003crule-clause\u003e ::= (\u003cpattern\u003e \u003caction\u003e)\n;;;   \u003cpattern\u003e     ::= \u003cscheme-exp\u003e\n;;;   \u003caction\u003e      ::= \u003cscheme-exp\u003e\n;;;   \u003cdefault-exp\u003e ::= (default \u003cscheme-exp\u003e)\n```\n\nThe semantic of `define-lex` could be described as:\n\n\u003e the special form `define-lex` defines a named lexer under a lexical enviornment expanded by keyword `definition` (using `let`) or `definitions` (using `let*`), whose pattern-action pairs are specified by keyword `rule`, where the `\u003cpattern\u003e` of `\u003crule-clause\u003e` must be a expression which evaluated to be a RE-IR or RE-string, and the `\u003caction\u003e` of`\u003crule-clause\u003e` must be a expression which evaluated to be a `self-evaluating` data or a tripple arugments procedure. \n\nThis may be confusing, let's take `example/simple-lang.scm` as example:\n\n```scheme\n(define-lex simple-L\n  (definition\n    (integer     (rep+ (sig slex:digit)))\n    (identifier  (rep+ (sig slex:alpha)))\n    (func        (sig* #\\+ #\\-))\n    (keyword-set (exact-ci \"set!\"))\n    (delimiter   (sig slex:whitespace))\n    \n    ; also action procedure    \n    (sv-token\n      (lambda (color-func)\n        (lambda (token-str start-at)\n          (color-func token-str)))))\n  \n  (rule\n    (integer     (sv-token string-white))\n    (keyword-set (sv-token string-red))\n    (identifier  (sv-token string-green))\n    (func        (sv-token string-blue))\n    (delimiter   Lex/action:token-str)\n    (default     Lex/action:handle-error)))\n```\n\n\n## Theoretical detail of SLeX\n\n### Roadmap\n\n```text\n ~~~~~~~~~~~~~\n   ^      String                (DFANode . List\u003cDFANode\u003e)\n   |       +\\----+                    /\n  TODO     |  RE |                 DFA\n   |       +--+--+                  ^\n   v    Parse |                     | NFA-\u003eDFA\n ~~~~~~~~~~~~ v                     | \n            RE IR ---------------\u003e NFA\n             /        RE-\u003eNFA         \\\n    (Symbol . Char-set | RE)   (NFANode . NFANode)        \n\n```\n\nA DFA M is a 5-tuple, (Q, Σ, δ, q0, F), consisting of\n\n  + a finite set of states (Q)\n  + a finite set of input symbols called the alphabet (Σ)\n  + a transition function (δ : Q × Σ → Q)\n  + an initial or start state (q0 ∈ Q)\n  + a set of accept states (F ⊆ Q)\n\nIt is very useful to extend the transition function to receive a string as argument, thus we may define δ' as (where `$` stands for end-of-string):\n\n```haskell\nδ' : Q x Σ* -\u003e Q\nδ' q $ = q\nδ' q (w :: ws) = δ (δ' q ws) w\n```\n\nSometime, when defining a `scan` like function, we may find a partial transition function is useful also. A partial transition function also accept two argument as the normal transition function:\n\n```haskell\npartial-δ Q x Σ* x Σ* -\u003e Σ* x Σ*\npartial-δ q xs $ = xs :: \"\"\npartial-δ q xs (w :: ws) =\n    if δ q w == fail then xs :: (w :: ws)\n    else partial-δ (δ q w) (append xs w) ws\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeathking%2Fslex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeathking%2Fslex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeathking%2Fslex/lists"}