{"id":19347515,"url":"https://github.com/mrlsd/semantic-analyzer-rs","last_synced_at":"2025-04-06T04:12:15.034Z","repository":{"id":65253061,"uuid":"445261934","full_name":"mrLSD/semantic-analyzer-rs","owner":"mrLSD","description":"Semantic analyzer library for compilers written in Rust for semantic analysis of programming languages AST","archived":false,"fork":false,"pushed_at":"2025-02-27T12:48:31.000Z","size":1676,"stargazers_count":39,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-30T03:07:27.258Z","etag":null,"topics":["abstract-syntax-tree","compiler","compiler-construction","compiler-design","programming-language","semantic-analysis","semantic-analyzer"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mrLSD.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-06T17:54:12.000Z","updated_at":"2025-02-27T12:46:30.000Z","dependencies_parsed_at":"2023-01-15T23:00:38.344Z","dependency_job_id":"7b754a6c-d71e-49e3-9c4a-e860218d276e","html_url":"https://github.com/mrLSD/semantic-analyzer-rs","commit_stats":null,"previous_names":["mrlsd/semantic-analyzer-rs"],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrLSD%2Fsemantic-analyzer-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrLSD%2Fsemantic-analyzer-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrLSD%2Fsemantic-analyzer-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrLSD%2Fsemantic-analyzer-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mrLSD","download_url":"https://codeload.github.com/mrLSD/semantic-analyzer-rs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247430872,"owners_count":20937874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abstract-syntax-tree","compiler","compiler-construction","compiler-design","programming-language","semantic-analysis","semantic-analyzer"],"created_at":"2024-11-10T04:16:45.705Z","updated_at":"2025-04-06T04:12:15.017Z","avatar_url":"https://github.com/mrLSD.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Lints](https://github.com/mrLSD/z-rose/actions/workflows/lints.yml/badge.svg)](https://github.com/mrLSD/z-rose/actions/workflows/lints.yml)\n[![Tests](https://github.com/mrLSD/z-rose/actions/workflows/tests.yml/badge.svg)](https://github.com/mrLSD/z-rose/actions/workflows/tests.yml)\n[![Crates.io version](https://img.shields.io/crates/v/semantic-analyzer.svg?style=flat-square)](https://crates.io/crates/semantic-analyzer)\n[![codecov](https://codecov.io/gh/mrLSD/semantic-analyzer-rs/graph/badge.svg?token=ZQ8FCYSSZX)](https://codecov.io/gh/mrLSD/semantic-analyzer-rs)\n\n\u003cdiv style=\"text-align: center;\"\u003e\n    \u003ch1\u003emrLSD\u003ccode\u003e/semantic-analyzer-rs\u003c/code\u003e\u003c/h1\u003e\n\u003c/div\u003e\n\nSemantic analyzer is an open source semantic analyzer for programming languages \nthat makes it easy to build your own efficient compilers with extensibility in mind.\n\n## 🌀 What the library is for and what tasks it solves\n\nCreating a compilers for a programming language is process that involves several key \nstages. Most commonly they are:\n\n▶️ **Lexical Analysis (Lexer)**: This stage involves breaking down the input stream \nof characters into a series of tokens. Tokens are the atomic elements of the programming language, such as identifiers, keywords, operators, etc.\n\n▶️ **Syntax Analysis (Parsing)**: At this stage, the tokens obtained in the previous \nstage are grouped according to the grammar rules of the programming language. The result \nof this process is an **Abstract Syntax Tree (AST)**, which represents a hierarchical structure of the code.\n\n⏩ **Semantic Analysis**: This stage involves checking the semantic correctness of the code. This can include \ntype checking, scope verification of variables, etc.\n\n▶️ **Intermediate Code Optimization**: At this stage, the compiler tries to improve the intermediate representation of the code to make it more efficient. \nThis can include dead code elimination, expression simplification, etc.\n\n▶️ **Code Generation**: This is the final stage where the compiler transforms the optimized intermediate representation (IR) into \nmachine code specific to the target architecture.\n\nThis library represents **Semantic Analysis** stage.\n\n### 🌻 Features\n\n✅ **Name Binding and Scope Checking**: The analyzer verifies that all variables, constants, functions are declared before they're used, \nand that they're used within their scope. It also checks for name collisions, where variables, constants, functions, types in the same scope have the same name.\n\n✅ **Checking Function Calls**: The analyzer verifies that functions are called with the number of parameters and that the type of \narguments matches the type expected by the function.\n\n✅ **Scope Rules**: Checks that variables, functions, constants, types are used within their scope, and available in the visibility scope.\n\n✅ **Type Checking**: The analyzer checks that operations are performed on compatible types for expressions, functions, constant, bindings.\nFor operations in expressions. It is the process of verifying that the types of expressions are consistent with their usage in the context.\n\n✅ **Flow Control Checking**: The analyzer checks that the control flow statements (if-else, loop, return, break, continue) are used correctly. \nSupported condition expressions and condition expression correctness check.\n\n✅ **Building the Symbol Table**: For analyzing used the symbol table as data structure used by the semantic analyzer to keep track of \nsymbols (variables, functions, constants) in the source code. Each entry in the symbol table contains the symbol's name, type, and scope related for block state, and other relevant information.\n\n✅ **Generic expression value**: The ability to expand custom expressions for AST,\naccording to compiler requirements. And the ability to implement custom instructions \nfor these custom expressions in the **Semantic Stack Context**.\n\n### 🌳 Semantic State Tree\n\nThe result of executing and passing stages of the semantic analyzer is: **Semantic State Tree**.\n\nThis can be used for Intermediate Code Generation, for further passes\nsemantic tree optimizations, linting, backend codegen (like LLVM) to target machine.\n\n#### 🌲 Structure of Semantic State Tree \n\n- **blocks state** and related block state child branches. It's a basic\nentity for scopes: variables, blocks (function, if, loop). \nEspecially it makes sense for expressions. This allows you to granularly separate the visibility scope \nand its visibility limits. In particular - all child elements can access parent elements.\nHowever, parent elements cannot access child elements, which effectively limits the visibility scope and entity usage.\n\n  - **variables state**: block state entity, contains properties of variable in current\n  state like: name, type, mutability, allocation, mallocation.\n\n  - **inner variables state**: block state entity, contains inner variables names.\n  It's useful for Intermediate Representation for codegen backends like LLVM.\n  Where shadowed name variables should have different inner names. It means inner variables\n  always unique.\n\n  - labels state: block state entity, that contains all information about control flow labels.\n\n- **Global state**: contains global state of constants, declared functions and types.\n\n- **State entity**: contains: \n  - Global State \n  - Errors results\n  - Semantic tree results\n\nAll of that source data, that can be used for Intermediate Representation for next optimizations and compilers codegen.\n\n### 🧺 Subset of programming languages\n\nThe input parameter for the analyzer is a predefined\nAST (abstract syntax tree). As a library for building AST and the only dependency\nused [nom_locate](https://github.com/fflorent/nom_locate) - which allows getting\nall the necessary information about the source code, for further semantic analysis\nand generating relevant and informative error messages. Currently\ndecided that the AST is a fixed structure because it is a fundamental\nelement that defines the lexical representation of a programming language.\n\nOn the other hand, it allows you to implement any subset of the programming language that matches\nsyntax tree. It also implies a subset of lexical representations from which an AST can be generated \nthat meets the initial requirements of the semantic analyzer. As a library for lexical \nanalysis and source code parsing, it is recommended to use: [nom is a parser combinators library](https://github.com/rust-bakery/nom).\n\nAST displays the **Turing complete** programming language and contains all the necessary elements for this.\n\n## 🔋 🔌 Extensibility\n\nSince `AST` is predefined, but in real conditions it may be necessary to expand the \nfunctionality for the specific needs of the `compiler`, has been added the functionality \nof the `AST` extensibility and the additional generated set of `Instructions` for \nthe **Semantic Stack Context**.\n\n- [x] 🚨 **Genetic expression value**: The ability to expand custom expressions for z, according to compiler requirements. \nThe ability to implement custom instructions for these custom expressions in the \n**Semantic Stack Context**.\n\n## 🛋️ Examples\n\n- 🔎 There is the example implementation separate project [💾 Toy Codegen](https://github.com/mrLSD/toy-codegen).\nThe project uses the `SemanticStack` results and converts them into **Code Generation** logic which clearly shows the \npossibilities of using the results of the `semantic-analyzer-rs` `SemanticStackContext` results. LLVM is used as a \nbackend, [inkwell](https://github.com/TheDan64/inkwell) as a library for LLVM codegen, and compiled into an executable \nprogram. The source of data is the AST structure itself.\n\n## 📶 Features\n\nAvailable library rust features:\n- `codec` - 💮 enable serialization and deserialization with `Serde`.\n  This is especially convenient in the process of forming AST, Codegen, \n  a serialized representation of the `SemanticState`. Another important \n  nuance is that any library that implements `Serde` can act as a \n  serializer `codec`. For example formats: `json`, `toml`, `yaml`, \n  `binary`, and many others that can use `serde` library.\n  The main entities, which apply the `codec` feature are:\n  - [x] `AST` ↪️ AST data source can be presented with serialized source.\n    This is especially useful for designing and testing `Codegen`, AST data \n    transfer pipeline, and also for use as a data generation source for \n    AST - any programming language that can generate serialized AST data.\n  - [x] `State` ↪️ `SematniсState` can be obtained in the form of \n    serialized data. This is especially convenient for storing state \n    before code generation with different parameters, post-analysis, \n    optimizations - which will allow to work with already analyzed \n    data.\n  - [x] `SemanticStack` ↪️ contains a set of instructions for `Codegen`. \n    Representation in serialized form may be convenient for cases: code \n    generation without repeated semantic analysis, only based on \n    instructions for the code generator generated by the `semantic analyzer`. \n    Serialized data represented `SemanticStack` - opens up wide \n    possibilities for using any third-party code generators and compilers \n    implemented in any programming language.\n\n## MIT [LICENSE](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrlsd%2Fsemantic-analyzer-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrlsd%2Fsemantic-analyzer-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrlsd%2Fsemantic-analyzer-rs/lists"}