{"id":21989153,"url":"https://github.com/repcomm/recursive-descent-parser","last_synced_at":"2025-03-23T03:15:12.399Z","repository":{"id":130854299,"uuid":"299784053","full_name":"RepComm/recursive-descent-parser","owner":"RepComm","description":"Learning how to write a compiler","archived":false,"fork":false,"pushed_at":"2020-11-19T23:58:00.000Z","size":100,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-28T08:52:12.025Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RepComm.png","metadata":{"files":{"readme":"ReadMe.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-30T01:59:07.000Z","updated_at":"2020-11-19T23:58:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"d02362ea-3eb8-479a-a72d-513137ea359b","html_url":"https://github.com/RepComm/recursive-descent-parser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RepComm%2Frecursive-descent-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RepComm%2Frecursive-descent-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RepComm%2Frecursive-descent-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RepComm%2Frecursive-descent-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RepComm","download_url":"https://codeload.github.com/RepComm/recursive-descent-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245048278,"owners_count":20552483,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-29T19:28:04.185Z","updated_at":"2025-03-23T03:15:12.368Z","avatar_url":"https://github.com/RepComm.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# recursive-descent-parser\nLearning how to write a compiler\n\n\n## Building\n`npm run build` or `./build.sh`\n\n## Methods described\nThere are several steps to run source code\u003cbr\u003e\nA series of passes over the data make it easier to handle:\u003cbr\u003e\n\n### Tokenize\nThe first step scans through the source code as a string\u003cbr\u003e\nand returns a series of tokens\u003cbr\u003e\nidentified by their:\n1. `type` - defines syntactic usage, such as identifier, keyword, operator, brackets, etc\n2. `data` - typically the string represented by the type, but can be transformed by preprocessor\u003cbr\u003e\nFor instance, a preprocessor could take several tokens:\u003cbr\u003e\n`{type: \"parenthesis\", data:\"(\"}`,\u003cbr\u003e\n`{type: \"parenthesis\", data:\")\"}`,\u003cbr\u003e\n`{type: \"operator\", data:\"=\"}`,\u003cbr\u003e\n`{type: \"operator\", data:\"\u003e\"},`\u003cbr\u003e\n\u003cbr\u003e\nAnd turn them into\u003cbr\u003e\n`{type: \"arrow-function\", data:\"()=\u003e\"}`,\u003cbr\u003e\n3. line and char numbers (useful for debugging source)\n\n### Preprocess\nThis part is still in the works, but it will essentially\u003cbr\u003e\nbe a function that passes over tokens and returns a\u003cbr\u003e\nmodified set.\u003cbr\u003e\n\u003cbr\u003e\nWhat modifications actual entells is up to the preprocessor\u003cbr\u003e\nbut some examples are:\n- source directives\n- `.babelrc`\n- special language features\u003cbr\u003e\nnot supported by a parser that\u003cbr\u003e\ncan be broken down into lower level codes.\n\n### Parser\nCreates a tree structure from a token array\u003cbr\u003e\ncalled an Abstract Syntax Tree or AST\u003cbr\u003e\n\nThis is where the recursive descent part comes into play, and the part I came here to learn about.\n\n### Interpreter / Codegen\nI plan on implementing both an interpreter and code generator.\u003cbr\u003e\n\nThey will take an abstract syntax tree and\u003cbr\u003e\n- run (interpreter)\n- or compile (codegen) \u003cbr\u003e\nit into some lower level code\u003cbr\u003e\n(typically OP codes, or machine code)\n\n## Implementation\nIn my process I've decided to take a language-agnostic\u003cbr\u003e\napproach, even though my end goal is probably\u003cbr\u003e something like `typescript/javascript`\u003cbr\u003e\n\nFor instance, the tokenizer process actually relies\u003cbr\u003e\non a `Scanner`, which is where language syntax will actually be handled,\u003cbr\u003e\nand the `tokenize` function will already be implemented for you.\n\nTo handle your own language, you'll need to implement\u003cbr\u003e\na scanner subclass.\n\n### Scanner\nThis is a class meant to be extended\u003cbr\u003e\nIt provides functionality to implement scanning text\u003cbr\u003e\nan a more standard way, which should make debugging easier\u003cbr\u003e\n\n- addPass - for adding more syntax handling\n  ```ts\n  addPass(name: string, pass: ScannerPass): this\n  ```\n  Where `name` is the token.type when pass is successful\u003cbr\u003e\n  and pass is a [scanner pass](#ScannerPass)\n\n### ScannerPass\nEach scanner pass is meant to handle a single type of\u003cbr\u003e\nlanguage syntax.\n\n```ts\n(data: string, offset: number): ScannerData\n```\nWhere `data` is the source code data\u003cbr\u003e\n`offset` the offset in the source to read from\u003cbr\u003e\nand `return` expected to be a [ScannerData](#ScannerData)\n\n### ScannerData\n```ts\n{\n  success: boolean //needs to be false when not finding data at offset that satisfies ScannerData.type\n  readChars: number //chars that fit this type before we read something we didn't like\n  readLines: number //obsolete, this will be handled by internal code soon\n  error?: string //optional - meant for when positive identification of error is determined, not necessarily every time success == false\n}\n```\nNote that scanner data does not actually return the text that was read, only the char count.\u003cbr\u003e\nThis is to standardize the reading process, which should cause a lot less errors\u003cbr\u003e\nbetween implementations of languages.\u003cbr\u003e\n\nBasically: don't allow reading of chars that don't fit your specifications, and don't count ones that don't.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frepcomm%2Frecursive-descent-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frepcomm%2Frecursive-descent-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frepcomm%2Frecursive-descent-parser/lists"}