{"id":21033313,"url":"https://github.com/sameesunkaria/swiftregex","last_synced_at":"2025-05-15T13:31:59.671Z","repository":{"id":90242733,"uuid":"208459711","full_name":"Sameesunkaria/SwiftRegex","owner":"Sameesunkaria","description":null,"archived":true,"fork":false,"pushed_at":"2021-11-14T22:19:25.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-13T20:14:34.672Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sameesunkaria.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-14T15:26:11.000Z","updated_at":"2023-08-20T21:10:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"0afd947d-1965-447d-8161-94cf8bfe677a","html_url":"https://github.com/Sameesunkaria/SwiftRegex","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sameesunkaria%2FSwiftRegex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sameesunkaria%2FSwiftRegex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sameesunkaria%2FSwiftRegex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sameesunkaria%2FSwiftRegex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sameesunkaria","download_url":"https://codeload.github.com/Sameesunkaria/SwiftRegex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254349448,"owners_count":22056350,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T12:54:09.175Z","updated_at":"2025-05-15T13:31:59.663Z","avatar_url":"https://github.com/Sameesunkaria.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SwiftRegex\n\n## Introduction \n\nSwiftRegex is a regular expression parser, written in Swift 5.0. It was originally created as part of a university assignment and is hosted here for future reference.\n\nSwiftRegex has a type called `RegexParser`, which is responsible for taking in a regular expression in the form of a string, and converting it into an NFA or further into a DFA. \n\nThe regular expression can have all unicode symbols as part of the alphabet, except `|`, `*`, `(` and `)`. The first two represent the \"or\" operation (`|`) and the \"Kleene Closure\" operation (`*`). The parenthesis are used to define the precedence of operations. All other characters are treated as a part of the alphabet.\n\nSwiftRegex supports three operations, \"Kleene Closure\", \"or\" and \"concatenation\". If no operation is defined between two characters, a \"concatenation\" is inferred.\n\n## Running the code\n\nThere is a single Swift file, titled `main.swift`. If you have the Swift compiler installed on your system, you can just run this code by running the command `swift main.swift`.\n\nIf you do not have the Swift compiler installed, there are several online compilers (or playgrounds) for Swift. I have sucessfully tested this project on [https://repl.it/languages/swift](https://repl.it/languages/swift).\n\n## Usage\n\nThe struct `RegexParser` can be initialized with a regular expression in the form of a string. By default `RegexParser` will generate an NFA for the regular expression it was initialized with. \n\n```swift\nlet parser = RegexParser(regex: \"(a|b)*abb\")\n```\n\nAt the time of initialization, we can pass in `true` for `reduceIntoDFA`, if we want the automaton to be converted from an NFA into a DFA. \n\n```swift\nlet parser = RegexParser(regex: \"(a|b)*abb\", reduceIntoDFA: true)\n```\n\nTo evaluate if a string belongs to the language defined by the regular expression passed into the `RegexParser`, we can use the `recognize(string:)` method on `RegexParser`. This method returns a `Bool`, specifying if the test string was recognized to be a part of the language.\n\n```swift\nparser.recognize(string: \"aabb\")\n```\n\nYou can also get a basic description of the automaton created by using the `printBasicAutomatonDescription` method on `RegexParser`. This prints out the number of nodes in the graph, reprsenting the underlying structure of the automaton. It also displays if a node is set to be the final node, and all of its transition states. \n\n```swift\nparser.printBasicAutomatonDescription()\n```\n\nThe output should look similar to this:\n\n```\nDescription of automation representing: (a|b)*abb\nEach line represents a node.\n\nThere are a total of 5 nodes.\nState    isFinal        Transition states\n0        false         [\"b -\u003e 4\", \"a -\u003e 0\"]\n1        true          [\"b -\u003e 3\", \"a -\u003e 0\"]\n2        false         [\"b -\u003e 3\", \"a -\u003e 0\"]\n3        false         [\"b -\u003e 3\", \"a -\u003e 0\"]\n4        false         [\"b -\u003e 1\", \"a -\u003e 0\"]\n```\n\n## Implementation\n\n### Data structure representing the automaton\n\nThe automation has been represented using a series of nodes, that hold the transitions information as well as a flag which states if it is a final node.\n\n```swift\nclass Node {\n    var isFinal = false\n    var transitions = [Transition]()\n}\n```\n\nA `Transition` holds the transition state, and a pointer to the next `Node`.\n\n```swift\nstruct Transition {\n    let state: Substring\n    let node: Node\n}\n```\n\n**NOTE:** An epsilon/empty transtion, is denoted by an empty `state` string (`\"\"`).\n\n### Parsing the regular expression into NFA\n\nThe regular expression string is parsed recursively, in two steps. \n\nThe first step is tokenizing the string into set of single expressions and operations. And the second step is to convert the expressions into their graph representation and apply the operations.\n\nTokenizing the regular expression string is handeled by the `generateTokens(expression:)` method. So, for our regular expression `(a|b)*abb`, the generated tokens are\n\n```\n[\"a|b\", \"*\", \"a\", \"b\", \"b\"]\n```\n\nNow, all expressions are converted into their graph representation. Single character expressions are replaced, with two nodes and a transition from the first node to the second one, with the transition state being the character itself.\n\nHere, `a|b` is also a single expression, but since it is not represented by a single character, it has to be evaluated seperately before we can evaluate our complete expression. Hence, the recursive nature of this alogrithm.\n\nOnce the tokens are generated, and all the single expressions are evaluated, now the operations are applied. \n\nThe operations of the single expressions are applied in a sequential order based on the precedence of the operator.\n\n- Kleene Closure: `evaluateKleenClosures(on:)`\n- OR: `evaluateUnionOperations(on:)`\n- Concatenation: `evaluateConcatenation(on:)`\n\nAfter evaluating all the operations on the single expressions represented with a graph, a complete graph is generated. And the final node has its `isFinal` flag set to `true`. \n\nThis is done in the method `parseExpression`.\n\nThis expression is used to evaluate our test strings.\n\n### Converting the NFA into a DFA\n\nTo convert the NFA into a DFA, we need to have a set of all the nodes, in our graph. To find that, a depth first search algorithm is used.\n\nAll the nodes conform to the swift protocol `Hashable`, and hence, can be stored in a Swift `Set`.\n\nNow we create an epsilon closure for each of the nodes. This closure is stored as a `Set\u003cNode\u003e`. This is crutial, since we would need to perform set operations on the epsilon closures.\n\nWe also need to know all of the symbols in the alphabet of the language, which is done by the method `getInputSymbols(for:)`.\n\nFinally, the `generateDFA(from:symbols:)` method, takes in the NFA expression, and converts it into a DFA.\n\n### Recognizing strings\n\nTo recognize if a test string belongs in the language represented by a regular expression, the `recognize(string:)` method is used. This method recursively checks if the string belongs to the language.\n\nThe transition states of the parsed expression, are compared against the first letter of the test string, and if the state matches, a substring from the second letter onwards is checked against the expression starting from the next node after transition.\n\nIn the case of an epsilon transition (transition state: `\"\"`), the complete string is compared against the expression starting from the next node after transition.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsameesunkaria%2Fswiftregex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsameesunkaria%2Fswiftregex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsameesunkaria%2Fswiftregex/lists"}