{"id":19391237,"url":"https://github.com/vonderklaas/tiny-lexer","last_synced_at":"2026-02-10T12:33:52.469Z","repository":{"id":206799650,"uuid":"716157507","full_name":"vonderklaas/tiny-lexer","owner":"vonderklaas","description":"A program written in pure C language, that can perform lexical tokenization of an arbitrary programming language, 'tinylang' in this particular case.","archived":false,"fork":false,"pushed_at":"2024-05-07T07:57:40.000Z","size":52,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-24T02:43:10.705Z","etag":null,"topics":["c","lexer","lexer-parser","lexical-analysis"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vonderklaas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-08T14:59:01.000Z","updated_at":"2025-04-15T05:09:25.000Z","dependencies_parsed_at":"2024-11-10T10:37:37.053Z","dependency_job_id":null,"html_url":"https://github.com/vonderklaas/tiny-lexer","commit_stats":null,"previous_names":["garbalau-github/tiny-compiler","vonderklaas/tiny-lexer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vonderklaas/tiny-lexer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonderklaas%2Ftiny-lexer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonderklaas%2Ftiny-lexer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonderklaas%2Ftiny-lexer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonderklaas%2Ftiny-lexer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vonderklaas","download_url":"https://codeload.github.com/vonderklaas/tiny-lexer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonderklaas%2Ftiny-lexer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259794117,"owners_count":22912247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","lexer","lexer-parser","lexical-analysis"],"created_at":"2024-11-10T10:25:46.161Z","updated_at":"2026-02-10T12:33:52.462Z","avatar_url":"https://github.com/vonderklaas.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"### Description\n\nLexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a lexer program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types.\n\n### Examples\n\nThis is source code\n```c\na : integer = 0\na := 0\n\nb : integer\nb := 0\n\ndefun foo (a:integer, b:integer):integer {\n\n}\n```\n\nThese are broken down tokens\n```c\nToken 0: a\nToken 1: :\nToken 2: integer\nToken 3: =\nToken 4: 0\nToken 5: a\nToken 6: :\nToken 7: =\nToken 8: 0\nToken 9: b\nToken 10: :\nToken 11: integer\nToken 12: b\nToken 13: :\nToken 14: =\nToken 15: 0\nToken 16: defun\nToken 17: foo\nToken 18: (\nToken 19: a\nToken 20: :\nToken 21: integer\nToken 22: ,\nToken 23: b\nToken 24: :\nToken 25: integer\nToken 26: )\nToken 27: :\nToken 28: integer\nToken 29: {\nToken 30: }\n```\n\n\n### Compilation Stages\n\n**Preprocessing** — ✅ \u003cbr\u003e\nInput: Source Code \u003cbr\u003e\nOutput: Modified Source Code\n\n**Tokenization** — ✅ \u003cbr\u003e\nInput: Preprocessed Source Code \u003cbr\u003e\nOutput: Stream of Tokens\n\n(WIP)\n**Syntax Analysis** \u003cbr\u003e\nInput: Tokens from Lexical Analysis (Tokenization) \u003cbr\u003e\nOutput: AST \n\n(WIP)\n**Semantic Analysis** \u003cbr\u003e\nInput: AST \u003cbr\u003e\nOutput: Annotated AST with Semantic Information\n\n(WIP)\n**Intermediate Code Generation** \u003cbr\u003e\nInput: Annotated AST \u003cbr\u003e\nOutput: IR\n\n(WIP)\n**Optimization** \u003cbr\u003e\nInput: IR \u003cbr\u003e\nOutput: Optimized IR\n\n(WIP)\n**Code Generation** \u003cbr\u003e\nInput: Optimized IR \u003cbr\u003e\nOutput: Machine Code or Assembly\n\n**Linking** \u003cbr\u003e\nInput: Compiled Machine Code \u003cbr\u003e\nOutput: Single Executable for Specific Architecture\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvonderklaas%2Ftiny-lexer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvonderklaas%2Ftiny-lexer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvonderklaas%2Ftiny-lexer/lists"}