Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dreden33773/pl_0

a simple pl/0 compiler implemented in rust.
https://github.com/dreden33773/pl_0

compiler compiler-design compiler-principles curriculum-design nuaa pl0 rust

Last synced: 2 days ago
JSON representation

a simple pl/0 compiler implemented in rust.

Awesome Lists containing this project

README

        

# PL/0 (aka. PL_0)

> ## ❤️ Please give me a `Star` / `Follow` if you like this project! ❤️

## To begin with

This is the `curriculum design` of `Compiler Principle` course in `Nanjing University of Aeronautics and Astronautics` (
aka. `NUAA`).

## Introduction

`PL/0` is a `subset language` of `Pascal`.

This is a simple `Rust` implementation of `PL/0` compiler.

## BNF

```bnf
-> program ;
-> [][][]
-> const {, } ;
-> :=
-> var {, } ;
-> procedure ([ {, }]) ; {; }
-> begin {; } end
-> :=
| if then [else ]
| while do
| call ([ {, }])
|
| read ( {, })
| write ( {, })
-> | odd
-> [+|-] { }
-> { }
-> | | ()
-> = | <> | < | <= | > | >=
-> + | -
-> * | /
-> { | }
-> {}
-> a | b | ... | z | A | B | ... | Z
-> 0 | 1 | ... | 9
```

## Structure

$$
\set{Source Code} \Longrightarrow \textbf{Lexer} \stackrel{Token}{\Longrightarrow} \textbf{Parser} \stackrel{AST}{\Longrightarrow} \textbf{CodeGen} \Longrightarrow \set{PCode} \longrightarrow \textbf{VM} \longrightarrow \set{Result}
$$

| Part | Analysis List |
|:-------:|:-------------------:|
| Lexer | `Lexical Analysis` |
| Parser | `Syntax Analysis` |
| CodeGen | `Semantic Analysis` |

## Overview

### Lexer/Tokenizer

This part is extreme easy, I've implemented it in my own hand without using any other tools.

(However, if you'd love to, you could use tools like `flex` or `pest` to generate `lexer/tokenizer` automatically)

### Parser

With the help of `Recursive Descent Algorithm`, `parser` is also not that hard to implement.

However, it's necessary to prove that the given [BNF](#bnf) satisfy the definition of `LL(1)` before implementing
`parser` in `Recursive Descent Algorithm`.

Proof will be given later.

### Error Handling

I've adopted the welcomed `panic-mode-liked` error handling strategy for this part, to make sure that the `compiler`
could find as many errors as possible in one run, instead of being halted by the first error.

To make sure error could be handled in a `synchronous` way, `FIRST-FOLLOW` table is a must (I've built this manually,
which could be further improved by using auto-tools).

### Codegen

`AST` to `PCode` code-generator is the default strategy for this part.

I'm working on a `AST` to `Lua-Backend-Adapted-Representation` (LBAR) code-generator as well (not implemented yet).

### Virtual Machine (aka. VM / Interpreter)

Sense `PCode` is the default execution result of `codegen`, the `Simple-PCode-Interpreter` is the default implementation
of `Virtual Machine`

Still, I'm trying to implement a `Lua-VM-Liked-VM` for `LBAR`

## Feasibility Analysis

### Proof: [BNF](#bnf) is `LL(1)`

To satisfy this, 3 conditions should be met:

$$
\begin{align*}
\text{Condition 1} &~\dots~ \text{No \textit{left recursion pattern} detected in the \textit{grammar}} \\
\text{Condition 2} &~\dots~ \forall A \in V_N (A \rightarrow \alpha_1 | \alpha_2 | \dots | \alpha_n) \Rightarrow First(\alpha_i) \cap First(\alpha_j) = \Phi ~ (i \ne j) \\
\text{Condition 3} &~\dots~ \forall A \in V_N (\epsilon \in First(A)) \Rightarrow First(A) \cap Follow(A) = \Phi
\end{align*}
$$

Now, let's prove them one by one!

#### Condition 1 ~ No _left recursion pattern_ detected in the _grammar_

After having a glance of the given [BNF](#bnf), we could easily prove that:

$$
\forall A \in V_N (A \rightarrow B ~\wedge~ B \in V_N ) \Rightarrow A \ne B
$$

Which means that, there's no _left recursion pattern_ detected in the _grammar_.

#### Condition 2

This could be easy, with the reference of [BNF](#bnf) and [first_follow_table](./src/parser/synchronizer/tables.rs)

#### Condition 3

Just the same as `Condition 2`

## Fibonacci Demo

Source code:

```pascal

program fibonacci;

const index := 30;

var return,i,a;

procedure fib(a,x);

var sum;
begin
sum := 0;
if x<2 then
return := x
else
begin
call fib(a+1,x-1);
sum := sum+return;
call fib(a+1,x-2);
sum := sum+return;
return := sum
end
end

begin
i := 1;
a := 2;
while i<=index do
begin
call fib(a+1,i);
write(return);
i := i+1
end
end

```

Result:

- Console

```txt
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
196418
317811
514229
832040
```

- PCode

```txt
PCode List:
======================================================================
0| JMP 0 39
1| JMP 0 4
2| STA 1 4
3| STA 2 3
4| INT 0 6
5| LIT 0 0
6| STO 0 5
7| LOD 0 4
8| LIT 0 2
9| OPR 0 10
10| JPC 0 14
11| LOD 0 4
12| STO 1 3
13| JMP 0 38
14| LOD 0 3
15| LIT 0 1
16| OPR 0 2
17| LOD 0 4
18| LIT 0 1
19| OPR 0 3
20| CAL 1 2
21| LOD 0 5
22| LOD 1 3
23| OPR 0 2
24| STO 0 5
25| LOD 0 3
26| LIT 0 1
27| OPR 0 2
28| LOD 0 4
29| LIT 0 2
30| OPR 0 3
31| CAL 1 2
32| LOD 0 5
33| LOD 1 3
34| OPR 0 2
35| STO 0 5
36| LOD 0 5
37| STO 1 3
38| OPR 0 0
39| INT 0 7
40| LIT 0 1
41| STO 0 4
42| LIT 0 2
43| STO 0 5
44| LOD 0 4
45| LIT 0 30
46| OPR 0 13
47| JPC 0 61
48| LOD 0 5
49| LIT 0 1
50| OPR 0 2
51| LOD 0 4
52| CAL 0 2
53| LOD 0 3
54| OPR 0 14
55| OPR 0 15
56| LOD 0 4
57| LIT 0 1
58| OPR 0 2
59| STO 0 4
60| JMP 0 44
61| OPR 0 0
======================================================================
```

- Symbol Table

```txt
Symbol Table:
======================================================================
name | type | val | level | addr | size | scope_list
======================================================================
index | const | 30 | 0 | 3 | 0 | ["#"]
return | var | 0 | 0 | 3 | 0 | ["#"]
i | var | 0 | 0 | 4 | 0 | ["#"]
a | var | 0 | 0 | 5 | 0 | ["#"]
fib | proc | 2 | 0 | 6 | 2 | ["#"]
a | var | 0 | 1 | 3 | 0 | ["#", "fib"]
x | var | 0 | 1 | 4 | 0 | ["#", "fib"]
sum | var | 0 | 1 | 5 | 0 | ["#", "fib"]
======================================================================
```

## Error Handling Demos

As is mentioned follow, this implementation of pl/0 compiler has a complete error handling strategy, which means that it
could find as many errors as possible in one run, instead of being halted by the first error.

Here are some simple demos:

### Syntax Error (may coexists with `Lexical Error`)

- src

```pascal
program ;
var a, b, c;
begin
a 1;
b := ;
é : 3;
if 1 = 1 then
write(1
else
write 0);
write a + b + c;
wrçte(1)
end
```

- console

```txt
SyntaxError{ Line: 1, Col: 9 }
| ~~ Expected field, but not found!

SyntaxError{ Line: 4, Col: 8 }
| ~~ Expected `:=`, but got `Integer(1)`

SyntaxError{ Line: 5, Col: 9 }
| ~~ Expected `` / `` / `()` field, but got an unmatchable token `;`

LexicalError{ Line: 6, Col: 3 }
| ~~ 'é' is not an ASCII character

LexicalError{ Line: 6, Col: 5 }
| ~~ ':' is an undefined sign, did you mean ':='?

SyntaxError{ Line: 6, Col: 7 }
| ~~ Expected `:=`, but got `Integer(3)`

SyntaxError{ Line: 6, Col: 7 }
| ~~ Expected field, but not found!

SyntaxError{ Line: 9, Col: 6 }
| ~~ Expected `)`, but got `Else`

SyntaxError{ Line: 10, Col: 11 }
| ~~ Expected `(`, but got `Integer(0)`

SyntaxError{ Line: 11, Col: 9 }
| ~~ Expected `(`, but got `Identifier("a")`

SyntaxError{ Line: 11, Col: 18 }
| ~~ Expected `)`, but got `;`

LexicalError{ Line: 12, Col: 5 }
| ~~ 'ç' is not an ASCII character

SyntaxError{ Line: 12, Col: 7 }
| ~~ Expected `:=`, but got `Identifier("te")`

SyntaxError{ Line: 12, Col: 7 }
| ~~ Expected field, but not found!

thread 'main' panicked at src/parser/mod.rs:149:7:
|> Errors above occurred (during `parsing`), compiling stopped ... <|

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```

### Semantic Error

#### Duplicated Definition

- src

```pascal
program MultiDef;

var a, a, a, a;

procedure proc();
begin
write(1)
end;

procedure proc();
begin
write(2)
end

begin
write(1)
end
```

- console

```txt
SemanticError{ Line: 3, Col: 8 }
| ~~ `a` is defined before

SemanticError{ Line: 3, Col: 11 }
| ~~ `a` is defined before

SemanticError{ Line: 3, Col: 14 }
| ~~ `a` is defined before

SemanticError{ Line: 10, Col: 14 }
| ~~ `proc` is defined before

thread 'main' panicked at src/translator/mod.rs:116:7:
attempt to subtract with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

#### Undefined

- src

```pascal
program undef;
begin
a := 1;
b := 2;
write(c)
end
```

- console

```txt
SemanticError{ Line: 3, Col: 3 }
| ~~ `a` is undefined

SemanticError{ Line: 4, Col: 3 }
| ~~ `b` is undefined

SemanticError{ Line: 5, Col: 9 }
| ~~ `c` is undefined

thread 'main' panicked at src/translator/mod.rs:73:7:
|> Errors above occurred (during `translation/codegen`), compiling stopped ... <|

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

#### `args_list.length` cannot match with definition(signature)

- src

```pascal
program WrongArgsListLength;

var a;

procedure proc();
begin
write(1)
end;

procedure procc(x, t, z);
begin
write(1)
end

begin
call proc(1, 1, 1);
call procc(3)
end
```

- console

```txt
SemanticError{ Line: 16, Col: 11 }
| ~~ `proc` expects 0 args, but received 3

SemanticError{ Line: 17, Col: 12 }
| ~~ `procc` expects 3 args, but received 1

thread 'main' panicked at src/translator/mod.rs:73:7:
|> Errors above occurred (during `translation/codegen`), compiling stopped ... <|
```

#### Assign to `const` / `procedure`

- src

```pascal
program AssignToConstProc;
const i := 1;

procedure proc();
begin
write(i)
end

begin
i := 16;
proc := 16
end
```

- console

```txt
SemanticError{ Line: 10, Col: 3 }
| ~~ `i` is not a variable

SemanticError{ Line: 11, Col: 6 }
| ~~ `proc` is not a variable

thread 'main' panicked at src/translator/mod.rs:73:7:
|> Errors above occurred (during `translation/codegen`), compiling stopped ... <|
```