https://github.com/shubhamai/grad

A custom programming language grad and its compiler, written in Rust
https://github.com/shubhamai/grad
egui rust
Last synced: 24 days ago
JSON representation
A custom programming language grad and its compiler, written in Rust
Host: GitHub
URL: https://github.com/shubhamai/grad
Owner: Shubhamai
Created: 2024-04-11T17:12:12.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-07-01T15:14:41.000Z (10 months ago)
Last Synced: 2025-02-06T05:45:17.759Z (3 months ago)
Topics: egui, rust
Language: Rust
Homepage: https://grad-lang.vercel.app
Size: 2.83 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # grad

This project implements a custom programming language `grad` and its compiler, written in Rust. The compiler follows a multi-stage process to transform source code into executable bytecode, which is then interpreted by a custom Stack Based Virtual Machine (VM).

## Getting Started

Try the language in the [playground](https://grad-lang.vercel.app).

### Example

```bash

cargo install grad

echo "let a = 10; print(a);" > example.grad

grad run example.grad

```

## Table of Contents

1. [Compiler Overview](#compiler-overview)

2. [Lexical Analysis](#lexical-analysis)

3. [Parsing](#parsing)

4. [Abstract Syntax Tree (AST)](#abstract-syntax-tree-ast)

5. [Code Generation](#code-generation)

6. [Virtual Machine](#virtual-machine)

7. [String Interning](#string-interning)

8. [Example: Program Compilation and Execution](#example-program-compilation-and-execution)

9. [Future Improvements](#future-improvements)

## Compiler Overview

The compiler follows these main stages:

1. [Lexical Analysis](./src/scanner.rs) - Tokenizes the input source code.

2. [Parsing](./src/ast.rs) - Builds an Abstract Syntax Tree (AST).

3. [Code Generation](./src/compiler.rs) - Transforms the AST into bytecode.

4. [Virtual Machine](./src/vm.rs) - Executes the generated bytecode.

## Lexical Analysis

The lexical analysis is performed by the `Lexer` struct. It tokenizes/splits the input source code into a series of [`Token`s](./src/scanner.rs).

```rust

pub struct Lexer {

    pub tokens: Vec,

}

#[derive(Debug, PartialEq, Clone)]

pub struct Token {

    pub token_type: TokenType,

    pub lexeme: String,

    pub span: std::ops::Range,

}

```

The `Lexer` uses [logos](https://github.com/maciejhirsz/logos) to identify different token types such as keywords, identifiers, literals, and operators. It also return the span (start and end positions) of each token in the source code.

## Parsing

The parsing stage is implemented using a recursive descent parser with [Pratt parsing](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html) for expressions.

```rust

pub struct Parser<'a> {

    lexer: &'a mut Lexer,

}

```

The parser uses methods like `parse_statement()`, `parse_expression()`, and various other parsing functions to build the Abstract Syntax Tree (AST).

The expression parsing uses the [Pratt parsing](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html) technique for handling operator precedence:

```rust

fn expr_bp(lexer: &mut Lexer, min_bp: u8) -> ParseResult {

    // ... (Pratt parsing implementation by matklad)

}

```

This allows for efficient and correct parsing of complex expressions with different operator precedences.

The AST is represented using the `ASTNode` enum:

```rust

pub enum ASTNode {

    IntNumber(i64),

    FloatNumber(f64),

    Identifier(String),

    Boolean(bool),

    String(String),

    Op(Ops, Vec),

    Callee(String, Vec),

    Let(String, Vec),

    Assign(String, Vec),

    If(Vec, Vec, Option>),

    While(Vec, Vec),

    Print(Vec),

    Function(String, Vec, Vec),

    Block(Vec),

}

```

This structure allows for representing various language constructs, including literals, variables, function calls, control flow statements, and more.

## Code Generation

The code generation phase transforms the AST into bytecode that can be executed by the Virtual Machine. This process is handled by the `Compiler` struct:

```rust

pub struct Compiler {

    chunk: Chunk,

    interner: Interner,

    locals: Vec,

    local_count: usize,

    scope_depth: u8,

    functions: Vec,

    function_count: usize,

}

```

The compiler emits bytecode instructions represented by the `OpCode` enum:

```rust

#[derive(Debug, Clone, Copy, Serialize, Deserialize)]

#[repr(u8)]

pub enum OpCode {

    OpConstant,

    OpNil,

    OpTrue,

    OpFalse,

    // ... (other opcodes)

}

```

These instructions, along with their operands, are stored in a `Chunk`:

```rust

#[derive(Debug, Clone, Serialize, Deserialize)]

pub struct Chunk {

    pub code: Vec,

    pub constants: Vec,

}

```

The `VectorType` enum allows for storage of both opcodes and constant indices in the same vector.

## Virtual Machine

The Virtual Machine (VM) is executes the generated bytecode. It's implemented in the `VM` struct:

```rust

pub struct VM {

    pub chunk: Chunk,

    ip: usize,

    stack: [ValueType; STACK_MAX],

    stack_top: usize,

    pub interner: Interner,

    globals: HashMap,

    call_frames: Vec,

    frame_index: usize,

}

```

The VM uses a stack-based architecture for executing instructions. It maintains a stack for operands and local variables, a global variable table, and call frames for function calls.

The main execution loop of the VM interprets each opcode and performs the corresponding operation:

```rust

pub fn run(&mut self) -> Result {

    // ... (main execution loop)

}

```

## String Interning

To optimize string handling, the compiler uses string interning via the `Interner` struct:

```rust

pub struct Interner {

    pub map: HashMap,

    vec: Vec,

}

```

It allows for efficient storage and comparison of strings by assigning unique indices to each unique string.

## Example: Program Compilation and Execution

Here's a simple program to demonstrate how it progresses through each stage of the compiler.

### Sample Program

```

print("Hello, world!");

let a = 4.0;

let b = 2**2;

print(a + b);

```

### Stage 1: Lexical Analysis

The lexer breaks down the program into tokens:

```

[

    Token { token_type: PRINT, lexeme: "print", span: 0..5 }

    Token { token_type: LeftParen, lexeme: "(", span: 5..6 }

    Token { token_type: String, lexeme: "\"Hello, world!\"", span: 6..21 }

    Token { token_type: RightParen, lexeme: ")", span: 21..22 }

    Token { token_type: SEMICOLON, lexeme: ";", span: 22..23 }

    Token { token_type: LET, lexeme: "let", span: 25..28 }

    Token { token_type: Identifier, lexeme: "a", span: 29..30 }

    ...

]

```

### Stage 2: Parsing and AST Generation

The parser creates an Abstract Syntax Tree:

```

Print

  String(""Hello, world!"")

Let(a)

  FloatNumber(4)

Let(b)

  Op(PostfixOp(StarStar))

    IntNumber(2)

    IntNumber(2)

Print

  Op(BinaryOp(Add))

    Identifier(a)

    Identifier(b)

```

### Stage 3: Code Generation

The compiler generates bytecode:

```

0000 OP_CONSTANT                    0 | intr->"Hello, world!"

0002 OP_PRINT

0003 OP_CONSTANT                    2 | 4

0005 OP_DEFINE_GLOBAL               1 | intr->a

0007 OP_CONSTANT                    4 | 2

0009 OP_CONSTANT                    5 | 2

0011 OP_POWER

0012 OP_DEFINE_GLOBAL               3 | intr->b

0014 OP_GET_GLOBAL                  6 | intr->a

0016 OP_GET_GLOBAL                  7 | intr->b

0018 OP_ADD

0019 OP_PRINT

0020 OP_RETURN

```

### Stage 4: Execution

The VM executes the bytecode, resulting in the output:

```

Hello, world!

8

```

## Future Improvements

While this compiler implements core functionality, there are several areas for potential improvements:

1. **Type System**: Implement a static type system with type inference for improved safety and performance.

2. **Optimization**: Add optimization passes to improve the generated bytecode's efficiency.

3. **Error Handling**: Enhance error reporting with more detailed messages and source code locations.

4. **Garbage Collection**: Implement a garbage collector for automatic memory management.

5. **REPL**: Implement a Read-Eval-Print Loop for interactive programming.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shubhamai/grad

Awesome Lists containing this project

README