https://github.com/valkmjolnir/brainfuck-jit

Brainfuck Just-In-Time compiler written in C++
https://github.com/valkmjolnir/brainfuck-jit

brainfuck compiler cpp esolang esoteric-interpreter esoteric-language esoteric-programming-language interpreter jit just-in-time

Last synced: 8 months ago
JSON representation

Brainfuck Just-In-Time compiler written in C++

Host: GitHub
URL: https://github.com/valkmjolnir/brainfuck-jit
Owner: ValKmjolnir
License: mit
Created: 2021-10-25T14:08:48.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2023-02-08T09:37:17.000Z (over 2 years ago)
Last Synced: 2025-01-12T14:11:45.524Z (10 months ago)
Topics: brainfuck, compiler, cpp, esolang, esoteric-interpreter, esoteric-language, esoteric-programming-language, interpreter, jit, just-in-time
Language: Brainfuck
Homepage:
Size: 35.2 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Brainfuck Just-In-Time Compiler

## __Introduction__

Brainfuck is a very interesting programming language that has only 8 operators:

|Operator|Code in C/C++|

|:----|:----|

|`+`|`buff[ptr]++`|

|`-`|`buff[ptr]--`|

|`>`|`ptr++`|

|`<`|`ptr--`|

|`[`|`if(!buff[ptr]) goto ']'`|

|`]`|`if(buff[ptr] goto '['`|

|`,`|`buff[ptr]=getchar()`|

|`.`|`putchar(buff[ptr])`|

This simple syntax makes brainfuck a great language for me to learn how to build an interpreter and jit(just-in-time) compiler.

## __Brainfuck Interpreter__

This project has a simple interpreter for brainfuck,

using switch-threading:

```C++

for(size_t i=0;i>>>>`|`p+=5`|

|`<<`|`p-=2`|

## __Just-In-Time Compiler__

### __mmap__

After generating opcodes,

it's quite easy for us to generate machine codes into a memory space allocated by `mmap`.

This memory space must be `read/write/exec` so we could execute the machine codes in this memory space.

You could see the `mmap` in `amd64jit::amd64jit(const size_t)` in file `amd64jit.h`.

I use a global u8 array `buff[0x20000]` to be the paper of brainfuck machine(and `rbx` stores the pointer),

and remember to use memset to clean the stack space to zero-filled.

```C++

/* set bf machine's paper pointer */

mem.push({0x48,0xbb}).push64((uint64_t)buff); // movq $buff,%rbx

```

### __Add & Sub Operations__

These four operators are not so difficult to translate to machine codes:

```C++

case op_add:  mem.push({0x80,0x03,(uint8_t)(op.num&0xff)}); break; // addb $op.num,(%rbx)

case op_sub:  mem.push({0x80,0x2b,(uint8_t)(op.num&0xff)}); break; // subb $op.num,(%rbx)

case op_addp: mem.push({0x48,0x81,0xc3}).push32(op.num);    break; // add $op.num,%rbx

case op_subp: mem.push({0x48,0x81,0xeb}).push32(op.num);    break; // sub $op.num,%rbx

```

### __Library Function putchar & getchar__

#### __putchar__

```C++

int putchar(int);

```

`op_out` uses the `putchar`,

write a demo and use objdump to see how the gcc and clang generate the machine code that calls the function,

then just copy them :)

```C++

mem.push({0x48,0xb8}).push64((uint64_t)putchar); // movabs $putchar,%rax

#ifndef _WIN32

mem.push({0x0f,0xbe,0x3b}); // movsbl (%rbx),%edi

#else

mem.push({0x0f,0xbe,0x0b}); // movsbl (%rbx),%ecx

#endif

mem.push({0xff,0xd0}); // callq *%rax

```

You may find that there's a small difference between generated machine code on Windows platform.

This is because the rule of parameter passing in __call convention__ of Windows is different from Linux/macOS/Unix.

And Linux/macOS/Unix use `rdi` to get the first parameter, but Windows uses `rcx`.

Although JIT-compiler developers should remember this rule,

it is quite easier to remember x86_64/amd64 call convention than x86_32...

#### __getchar__

```C++

int getchar();

```

`op_in` uses the `getchar`,

also we just use the objdump to see how gcc/clang generate the code,

and just copy them :)

Luckily, on Windows/Linux/macOS/Unix platform, the return value `int` will all be stored in register `rax`. And we just need to mov the low 8-bits of `rax` to `rbx[0]` (aka `movsbl %al,(%rbx)`).

```C++

mem.push({0x48,0xb8}).push64((uint64_t)getchar); // movabs $getchar,%rax

mem.push({0xff,0xd0}); // callq *%rax

mem.push({0x88,0x03}); // movsbl %al,(%rbx)

```

So we don't need to write `#ifndef _WIN32` and so on :)

### __Jump Operation__

`je` and `jne` are two difficulties in this project.

You must calculate the distance of two jump labels to make sure they work correctly.

```C++

amd64jit& amd64jit::je() {

    push({0x0f,0x84}).push32(0x0);// je

    stk.push(ptr);

    return *this;

}

amd64jit& amd64jit::jne() {

    push({0x0f,0x85}).push32(0x0);// jne

    uint8_t* je_next=stk.top();stk.pop();

    uint8_t* jne_next=ptr;

    uint64_t p0=jne_next-je_next;

    uint64_t p1=je_next-jne_next;

    jne_next[-4]=(p1&0xff);

    jne_next[-3]=((p1>>8)&0xff);

    jne_next[-2]=((p1>>16)&0xff);

    jne_next[-1]=((p1>>24)&0xff);

    je_next[-4]=(p0&0xff);

    je_next[-3]=((p0>>8)&0xff);

    je_next[-2]=((p0>>16)&0xff);

    je_next[-1]=((p0>>24)&0xff);

    return *this;

}

```

op_jf(`[`) uses the `je` and op_jt(`]`) uses the `jne`.

### __Conclusion__

|bf code|opcode|machine code|

|:----|:----|:----|

|`+`|op_add|`addb $op.num,(%rbx)`|

|`-`|op_sub|`subb $op.num,(%rbx)`|

|`>`|op_addp|`add $op.num %rbx`|

|`<`|op_subp|`sub $op.num %rbx`|

|`[`|op_jf|`je label`|

|`]`|op_jt|`jne label`|

|`,`|op_in|`callq *%rax` & `movsbl %al,(%rbx)`|

|`.`|op_out|`callq *%rax`|

Hope you enjoy it.

### __More__

Want to check the output machine code of different CPU arch?

You may need this website:

[__gcc.godbolt.org__](https://gcc.godbolt.org/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/valkmjolnir/brainfuck-jit

Awesome Lists containing this project

README