Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shubhamai/grad
A custom programming language grad and its compiler, written in Rust
https://github.com/shubhamai/grad
egui rust
Last synced: 19 days ago
JSON representation
A custom programming language grad and its compiler, written in Rust
- Host: GitHub
- URL: https://github.com/shubhamai/grad
- Owner: Shubhamai
- Created: 2024-04-11T17:12:12.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-26T18:33:11.000Z (5 months ago)
- Last Synced: 2024-06-27T22:50:49.423Z (5 months ago)
- Topics: egui, rust
- Language: Rust
- Homepage: https://grad-lang.vercel.app
- Size: 2.82 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# grad
This project implements a custom programming language `grad` and its compiler, written in Rust. The compiler follows a multi-stage process to transform source code into executable bytecode, which is then interpreted by a custom Stack Based Virtual Machine (VM).
## Getting Started
Try the language in the [playground](https://grad-lang.vercel.app).
### Example
```bash
cargo install grad
echo "let a = 10; print(a);" > example.grad
grad run example.grad
```## Table of Contents
1. [Compiler Overview](#compiler-overview)
2. [Lexical Analysis](#lexical-analysis)
3. [Parsing](#parsing)
4. [Abstract Syntax Tree (AST)](#abstract-syntax-tree-ast)
5. [Code Generation](#code-generation)
6. [Virtual Machine](#virtual-machine)
7. [String Interning](#string-interning)
8. [Example: Program Compilation and Execution](#example-program-compilation-and-execution)
9. [Future Improvements](#future-improvements)## Compiler Overview
The compiler follows these main stages:
1. [Lexical Analysis](./src/scanner.rs) - Tokenizes the input source code.
2. [Parsing](./src/ast.rs) - Builds an Abstract Syntax Tree (AST).
3. [Code Generation](./src/compiler.rs) - Transforms the AST into bytecode.
4. [Virtual Machine](./src/vm.rs) - Executes the generated bytecode.## Lexical Analysis
The lexical analysis is performed by the `Lexer` struct. It tokenizes/splits the input source code into a series of [`Token`s](./src/scanner.rs).
```rust
pub struct Lexer {
pub tokens: Vec,
}#[derive(Debug, PartialEq, Clone)]
pub struct Token {
pub token_type: TokenType,
pub lexeme: String,
pub span: std::ops::Range,
}
```The `Lexer` uses [logos](https://github.com/maciejhirsz/logos) to identify different token types such as keywords, identifiers, literals, and operators. It also return the span (start and end positions) of each token in the source code.
## Parsing
The parsing stage is implemented using a recursive descent parser with [Pratt parsing](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html) for expressions.
```rust
pub struct Parser<'a> {
lexer: &'a mut Lexer,
}
```The parser uses methods like `parse_statement()`, `parse_expression()`, and various other parsing functions to build the Abstract Syntax Tree (AST).
The expression parsing uses the [Pratt parsing](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html) technique for handling operator precedence:
```rust
fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> ParseResult {
// ... (Pratt parsing implementation by matklad)
}
```This allows for efficient and correct parsing of complex expressions with different operator precedences.
The AST is represented using the `ASTNode` enum:
```rust
pub enum ASTNode {
IntNumber(i64),
FloatNumber(f64),
Identifier(String),
Boolean(bool),
String(String),
Op(Ops, Vec),
Callee(String, Vec),
Let(String, Vec),
Assign(String, Vec),
If(Vec, Vec, Option>),
While(Vec, Vec),
Print(Vec),
Function(String, Vec, Vec),
Block(Vec),
}
```This structure allows for representing various language constructs, including literals, variables, function calls, control flow statements, and more.
## Code Generation
The code generation phase transforms the AST into bytecode that can be executed by the Virtual Machine. This process is handled by the `Compiler` struct:
```rust
pub struct Compiler {
chunk: Chunk,
interner: Interner,
locals: Vec,
local_count: usize,
scope_depth: u8,
functions: Vec,
function_count: usize,
}
```The compiler emits bytecode instructions represented by the `OpCode` enum:
```rust
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
#[repr(u8)]
pub enum OpCode {
OpConstant,
OpNil,
OpTrue,
OpFalse,
// ... (other opcodes)
}
```These instructions, along with their operands, are stored in a `Chunk`:
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Chunk {
pub code: Vec,
pub constants: Vec,
}
```The `VectorType` enum allows for storage of both opcodes and constant indices in the same vector.
## Virtual Machine
The Virtual Machine (VM) is executes the generated bytecode. It's implemented in the `VM` struct:
```rust
pub struct VM {
pub chunk: Chunk,
ip: usize,
stack: [ValueType; STACK_MAX],
stack_top: usize,
pub interner: Interner,
globals: HashMap,
call_frames: Vec,
frame_index: usize,
}
```The VM uses a stack-based architecture for executing instructions. It maintains a stack for operands and local variables, a global variable table, and call frames for function calls.
The main execution loop of the VM interprets each opcode and performs the corresponding operation:
```rust
pub fn run(&mut self) -> Result {
// ... (main execution loop)
}
```## String Interning
To optimize string handling, the compiler uses string interning via the `Interner` struct:
```rust
pub struct Interner {
pub map: HashMap,
vec: Vec,
}
```It allows for efficient storage and comparison of strings by assigning unique indices to each unique string.
## Example: Program Compilation and Execution
Here's a simple program to demonstrate how it progresses through each stage of the compiler.
### Sample Program
```
print("Hello, world!");let a = 4.0;
let b = 2**2;print(a + b);
```### Stage 1: Lexical Analysis
The lexer breaks down the program into tokens:
```
[
Token { token_type: PRINT, lexeme: "print", span: 0..5 }
Token { token_type: LeftParen, lexeme: "(", span: 5..6 }
Token { token_type: String, lexeme: "\"Hello, world!\"", span: 6..21 }
Token { token_type: RightParen, lexeme: ")", span: 21..22 }
Token { token_type: SEMICOLON, lexeme: ";", span: 22..23 }
Token { token_type: LET, lexeme: "let", span: 25..28 }
Token { token_type: Identifier, lexeme: "a", span: 29..30 }
...
]
```### Stage 2: Parsing and AST Generation
The parser creates an Abstract Syntax Tree:
```
String(""Hello, world!"")
Let(a)
FloatNumber(4)
Let(b)
Op(PostfixOp(StarStar))
IntNumber(2)
IntNumber(2)
Op(BinaryOp(Add))
Identifier(a)
Identifier(b)
```### Stage 3: Code Generation
The compiler generates bytecode:
```
0000 OP_CONSTANT 0 | intr->"Hello, world!"
0002 OP_PRINT
0003 OP_CONSTANT 2 | 4
0005 OP_DEFINE_GLOBAL 1 | intr->a
0007 OP_CONSTANT 4 | 2
0009 OP_CONSTANT 5 | 2
0011 OP_POWER
0012 OP_DEFINE_GLOBAL 3 | intr->b
0014 OP_GET_GLOBAL 6 | intr->a
0016 OP_GET_GLOBAL 7 | intr->b
0018 OP_ADD
0019 OP_PRINT
0020 OP_RETURN
```### Stage 4: Execution
The VM executes the bytecode, resulting in the output:
```
Hello, world!
8
```## Future Improvements
While this compiler implements core functionality, there are several areas for potential improvements:
1. **Type System**: Implement a static type system with type inference for improved safety and performance.
2. **Optimization**: Add optimization passes to improve the generated bytecode's efficiency.
3. **Error Handling**: Enhance error reporting with more detailed messages and source code locations.
4. **Garbage Collection**: Implement a garbage collector for automatic memory management.
5. **REPL**: Implement a Read-Eval-Print Loop for interactive programming.