https://github.com/valkmjolnir/brainfuck-jit
Brainfuck Just-In-Time compiler written in C++
https://github.com/valkmjolnir/brainfuck-jit
brainfuck compiler cpp esolang esoteric-interpreter esoteric-language esoteric-programming-language interpreter jit just-in-time
Last synced: 8 months ago
JSON representation
Brainfuck Just-In-Time compiler written in C++
- Host: GitHub
- URL: https://github.com/valkmjolnir/brainfuck-jit
- Owner: ValKmjolnir
- License: mit
- Created: 2021-10-25T14:08:48.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-02-08T09:37:17.000Z (over 2 years ago)
- Last Synced: 2025-01-12T14:11:45.524Z (10 months ago)
- Topics: brainfuck, compiler, cpp, esolang, esoteric-interpreter, esoteric-language, esoteric-programming-language, interpreter, jit, just-in-time
- Language: Brainfuck
- Homepage:
- Size: 35.2 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Brainfuck Just-In-Time Compiler
## __Introduction__
Brainfuck is a very interesting programming language that has only 8 operators:
|Operator|Code in C/C++|
|:----|:----|
|`+`|`buff[ptr]++`|
|`-`|`buff[ptr]--`|
|`>`|`ptr++`|
|`<`|`ptr--`|
|`[`|`if(!buff[ptr]) goto ']'`|
|`]`|`if(buff[ptr] goto '['`|
|`,`|`buff[ptr]=getchar()`|
|`.`|`putchar(buff[ptr])`|
This simple syntax makes brainfuck a great language for me to learn how to build an interpreter and jit(just-in-time) compiler.
## __Brainfuck Interpreter__
This project has a simple interpreter for brainfuck,
using switch-threading:
```C++
for(size_t i=0;i>>>>`|`p+=5`|
|`<<`|`p-=2`|
## __Just-In-Time Compiler__
### __mmap__
After generating opcodes,
it's quite easy for us to generate machine codes into a memory space allocated by `mmap`.
This memory space must be `read/write/exec` so we could execute the machine codes in this memory space.
You could see the `mmap` in `amd64jit::amd64jit(const size_t)` in file `amd64jit.h`.
I use a global u8 array `buff[0x20000]` to be the paper of brainfuck machine(and `rbx` stores the pointer),
and remember to use memset to clean the stack space to zero-filled.
```C++
/* set bf machine's paper pointer */
mem.push({0x48,0xbb}).push64((uint64_t)buff); // movq $buff,%rbx
```
### __Add & Sub Operations__
These four operators are not so difficult to translate to machine codes:
```C++
case op_add: mem.push({0x80,0x03,(uint8_t)(op.num&0xff)}); break; // addb $op.num,(%rbx)
case op_sub: mem.push({0x80,0x2b,(uint8_t)(op.num&0xff)}); break; // subb $op.num,(%rbx)
case op_addp: mem.push({0x48,0x81,0xc3}).push32(op.num); break; // add $op.num,%rbx
case op_subp: mem.push({0x48,0x81,0xeb}).push32(op.num); break; // sub $op.num,%rbx
```
### __Library Function putchar & getchar__
#### __putchar__
```C++
int putchar(int);
```
`op_out` uses the `putchar`,
write a demo and use objdump to see how the gcc and clang generate the machine code that calls the function,
then just copy them :)
```C++
mem.push({0x48,0xb8}).push64((uint64_t)putchar); // movabs $putchar,%rax
#ifndef _WIN32
mem.push({0x0f,0xbe,0x3b}); // movsbl (%rbx),%edi
#else
mem.push({0x0f,0xbe,0x0b}); // movsbl (%rbx),%ecx
#endif
mem.push({0xff,0xd0}); // callq *%rax
```
You may find that there's a small difference between generated machine code on Windows platform.
This is because the rule of parameter passing in __call convention__ of Windows is different from Linux/macOS/Unix.
And Linux/macOS/Unix use `rdi` to get the first parameter, but Windows uses `rcx`.
Although JIT-compiler developers should remember this rule,
it is quite easier to remember x86_64/amd64 call convention than x86_32...
#### __getchar__
```C++
int getchar();
```
`op_in` uses the `getchar`,
also we just use the objdump to see how gcc/clang generate the code,
and just copy them :)
Luckily, on Windows/Linux/macOS/Unix platform, the return value `int` will all be stored in register `rax`. And we just need to mov the low 8-bits of `rax` to `rbx[0]` (aka `movsbl %al,(%rbx)`).
```C++
mem.push({0x48,0xb8}).push64((uint64_t)getchar); // movabs $getchar,%rax
mem.push({0xff,0xd0}); // callq *%rax
mem.push({0x88,0x03}); // movsbl %al,(%rbx)
```
So we don't need to write `#ifndef _WIN32` and so on :)
### __Jump Operation__
`je` and `jne` are two difficulties in this project.
You must calculate the distance of two jump labels to make sure they work correctly.
```C++
amd64jit& amd64jit::je() {
push({0x0f,0x84}).push32(0x0);// je
stk.push(ptr);
return *this;
}
amd64jit& amd64jit::jne() {
push({0x0f,0x85}).push32(0x0);// jne
uint8_t* je_next=stk.top();stk.pop();
uint8_t* jne_next=ptr;
uint64_t p0=jne_next-je_next;
uint64_t p1=je_next-jne_next;
jne_next[-4]=(p1&0xff);
jne_next[-3]=((p1>>8)&0xff);
jne_next[-2]=((p1>>16)&0xff);
jne_next[-1]=((p1>>24)&0xff);
je_next[-4]=(p0&0xff);
je_next[-3]=((p0>>8)&0xff);
je_next[-2]=((p0>>16)&0xff);
je_next[-1]=((p0>>24)&0xff);
return *this;
}
```
op_jf(`[`) uses the `je` and op_jt(`]`) uses the `jne`.
### __Conclusion__
|bf code|opcode|machine code|
|:----|:----|:----|
|`+`|op_add|`addb $op.num,(%rbx)`|
|`-`|op_sub|`subb $op.num,(%rbx)`|
|`>`|op_addp|`add $op.num %rbx`|
|`<`|op_subp|`sub $op.num %rbx`|
|`[`|op_jf|`je label`|
|`]`|op_jt|`jne label`|
|`,`|op_in|`callq *%rax` & `movsbl %al,(%rbx)`|
|`.`|op_out|`callq *%rax`|
Hope you enjoy it.
### __More__
Want to check the output machine code of different CPU arch?
You may need this website:
[__gcc.godbolt.org__](https://gcc.godbolt.org/)