https://github.com/valentin271/tbc
A TinyBASIC compiler in Rust
https://github.com/valentin271/tbc
Last synced: 11 months ago
JSON representation
A TinyBASIC compiler in Rust
- Host: GitHub
- URL: https://github.com/valentin271/tbc
- Owner: Valentin271
- Created: 2023-12-19T19:14:01.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-12-20T11:52:51.000Z (over 2 years ago)
- Last Synced: 2025-03-31T22:39:41.655Z (about 1 year ago)
- Language: Rust
- Size: 21.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Tiny BASIC compiler
This is a compiler for a subset of the [Tiny BASIC](https://en.wikipedia.org/wiki/Tiny_BASIC)
language. It also dumps a number of other files related to the compilation toolchain.
It uses my library [tiny-elf](https://github.com/Valentin271/tiny-elf/) to manipulate assembly and generate ELF files.
## Parser generator
This project uses [pest](http://pest.rs/) to generate a parser from a grammar.
The grammar is defined in [`src/parser/tinybasic.pest`](src/parser/tinybasic.pest).
## Running
You can run the project by typing:
```sh
cargo run --
```
For example:
```sh
cargo run -- data/opt.tb
```
You can add the `-r` flag to directly run it. This will also append the return code to the output.
# Features
## Statements
This is an overview of the language, including limitations related to each statement.
| Statement | Action |
| ---------------- | ---------------------------------------------------------------------------------------------------- |
| `PRINT` | Prints an expression |
| `IF`, `ELSE` | Classical conditional statement. The condition is limited compared to C-style languages, see NOTE 2. |
| `GOTO`, `GOSUB` | Go to the specified line, unlike TinyBASIC , this does not support `GOTO ` |
| `INPUT` | Stores an input into a variable, currently only one digit number are supported |
| `LET` | Declare a variable. Variables cannot contain strings |
| `END` , `RETURN` | Ends the program, this is normally not the semantic for `RETURN` |
NOTE: An expression is a string or an arithmetic expression. Expressions can contain variables,
and as such, an expression can be a variable. Strings cannot appear in an arithmetic expression.
NOTE 2: A condition is of the form ` ` where `` is a relational
operator and `` can be a number or a variable, it **cannot** be an expression.
NOTE 3: `INPUT` must really receive one and only one character. This means inputting from the
command line will not work as it also records a `\n`. Instead you can use `echo -n "5" | dump.elf`
## Optimizations
Assembly optimizations can be seen in [`optimize.rs`](src/optimize.rs). It is mostly converting
an instruction or set of instruction to faster ones.
Structural optimizations, and perhaps the most interesting ones are scattered throughout the code
in `Optimize` implementations. The major ones are in [`expr.rs`](src/syntax_tree/expr.rs) and
[`stmt.rs`](src/syntax_tree/stmt.rs). Respectively, they can optimize away arithmetic expressions
and conditions at compile time.
# Fun
This compiler outputs multiple files in addition to the executable binary. It actually outputs two
binaries:
- `udump.elf`, `u` standing for unoptimized, this is the functioning program without any
optimization
- `dump.elf` is the final file that a compiler would output, this is an executable ELF file with
optimizations
## Assembly
In addition to executable files, this compiler outputs the corresponding assembly code. As for
executables there is `udump.asm` and `dump.asm`.
These are also fully functioning, in fact, you can compile them with `nasm`. There is a `make asm`
rule to compile and link `dump.asm` into `asm.out`.
Interestingly, when compiling with `nasm` you'll notice the file is way bigger than with this
compiler (for `opt.tb`, 8.3K after stripping vs 493 bytes). This is mainly due to `nasm` doing its
job correctly, using multiple sections and aligning them. `xxd asm.out` will reveal the file is mainly zeros.
## [Graphviz](https://graphviz.org/)
For educational purposes, the parse tree, AST and OST (optimized syntax tree) are dumped using
Graphviz. Those are very interesting to compare, especially on `opt.rs` which will be stripped of
unused calculations and statements.
# Project overview
The code is partially commented. Here is a list of the main modules and files in order of interest.
- [`parser`](src/parser.rs) is the first step of the compiler, it contains the pest generated
parser and the grammar
- [`syntax_tree`](src/syntax_tree.rs) holds the AST and its nodes. It handles their conversion from
the parse tree and generation as assembly or Graphviz
- [`symbol_table`](src/symbol_table.rs) contains the symbol table and its related
- [`graphviz`](src/graphviz.rs) contains everything related to [Graphviz](https://graphviz.org/)
# Possible improvements
In no particular order.
- Handle string for `INPUT`
- Handle numbers greater than 9 for `INPUT`, it is non-trivial to convert integers to string and
even harder from string to integer, read syscall only returns what you'd consider a string
- Put strings in variables
- Implement `GOSUB` and `RETURN` properly
- Remove unused assembly functions (`print`, `printn`, `read` in [`syntax_tree.rs`](src/syntax_tree.rs))