Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/l1mey112/tl-v

A tiny, proof of concept, programming language.
https://github.com/l1mey112/tl-v

ast compiler compiler-design gas programming-language vlang x86-64

Last synced: about 1 month ago
JSON representation

A tiny, proof of concept, programming language.

Awesome Lists containing this project

README

        

# t(iny)l(ang)

A tiny, proof of concept, programming language.

```
proc main {
r = 0;
}
```

The creation of `tl` and it's compiler were used to learn the basics of conventional AST based compilers after the completion of the stack based [`stas`](https://github.com/l1mey112/stas) programming language.

## Syntax

The syntax of of `tl` is a mingle of C with some alterations. For example, if statements and while loops do not need parentheses circling the condition.

You cannot declare any variables, as there are a limited amount of predefined ones.

```
s0, s1, s2, s3, s4, s5 | Stack variables
a0, a1, a2, a3, a4, a5 | Argument variables
r | Return variable
```

The stack variables are used to store local values. Use the argument variables to pass values to function calls, expect them to be destroyed on return. The return variable is used to pass return values on return.

Since function calls to not return values, they are not expressions. Use the `call` keyword to call a function.

```
call function;

s0 = function;
call s0;
```

There is only one builtin function, `print`. It is used to print the value of one number in `a0`.

Predefining an array of values is also supported using the `data` keyword, mainly used to test out pointer indexing.

Examples of `tl` programs are in the [`examples/`](./examples/) folder.

## The `tl` compiler

The `tl` compiler is split into 3 equally small files, the [lexer](./lexer.v), [parser](./parser.v) and [code generator](./gas.v).

## Lexer

The lexer is a construct that performs lexical analysis on a string of characters read from a file. The [`Lexer.get()`](./lexer.v#L23) function is used to scan and return a single token. It scans tokens linearly, with zero backtracking. No past tokens are stored, the parser constantly requests new tokens directly with the functions [`Lexer.expect()`](./lexer.v#142), [`Lexer.next()`](./lexer.v#147) and [`Lexer.curr()`](./lexer.v#153).

## Parser

The parser parses the stream of tokens into an [AST](https://en.wikipedia.org/wiki/Abstract_syntax_tree) representation. The entrypoint to the parser is the [`Parser.parse()`](./parser.v#281) function, returning the root AST node in the tree.

It is a recursive parser, starting by parsing toplevel statements such as function declaration and hardcoded arrays. Whilst inside a function body it then constantly calls the [`Parser.stmt()`](./parser.v#180) to build up all statements inside it. This function handles all language statements/constructs, like if statements and while loops. The [`Parser.expr()`](./parser.v#128) function is used to parse and build up an expression tree with respect to operator precedence. Each specific expression parser works by eating up a operator token until it encounters once with a higher precedence, then calling up the chain of expression parsing functions.

- [`Parser.expr()`](./parser.v#128) - Highest Right precedence (start here)
- `=`
- [`Parser.expr1()`](./parser.v#104) - Lowest Left precedence
- `>`, `>=`, `<`, `<=`, `==`, `!=`
- [`Parser.expr2()`](./parser.v#90) - Left precedence
- `+`, `-`
- [`Parser.expr3()`](./parser.v#76) - Left precedence
- `*`, `/`
- [`Parser.expr4()`](./parser.v#59) - Left precedence
- `index[]`
- [`Parser.term()`](./parser.v#42) - Highest precedence
- `123`, `variable`, `(paren expr)`

For example, this line of code will generate this AST.

```
proc main {
s0 = 10 + 2 * 4;
}
```
```
s_proc
expr
assign
ident `s0`
add
dec `10`
mul
dec `2`
dec `4`
```

## Code Generator

The code generator takes the tree representation of the program and generates x86_64 assembly of GNU dialect to be used with GCC.

The AST tree format is especially useful as it is an abstraction over the original source code, so the code generator can simply recurse over all nodes in the tree generating code. The entrypoint to the code generator is the [`Gen.generate_all()`](./gas.v#250).

The code generator takes inspiration from V's native backend, specifically with how expressions are handled. It is extremely simple, and generates a lot of useless code, however it gets the job done. Modern compiler optimisations require the AST to be converted to a lower level format first, adding a large amount of complexity and translations.

Expressions are handled in the [`Gen.expr()`](./gas.v#64) function. They are generated by traversing the left path of the current node first. On return, the left path's expression ends up in the `rax` register, it is then moved to the `rcx` register and right path of the current node is traversed. When done, the `rax` and `rcx` register contains the left path and the right path respectively and are used to perform an operation such as arithmetic or a comparison. This form of divide and conquer recursion allows the code generator to be quite lightweight.

```
proc main {
2 + 4 * 10;
}
```

```nasm
mov rax, 10
push rax
mov rax, 4
pop rcx
imul rax, rcx
push rax
mov rax, 2
pop rcx
add rax, rcx
```

## Building and compilation

Use the [`v`](https://vlang.io/) compiler to generate the compiler. Call the compiler with the program, then pipe the output to GCC.

```
v -g . -o tl
./tl main.tl | gcc -x assembler -
./a.out
```