Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/skx/assembler
Basic X86-64 assembler, written in golang
https://github.com/skx/assembler
assembler assembly compiler golang x86-64
Last synced: 3 months ago
JSON representation
Basic X86-64 assembler, written in golang
- Host: GitHub
- URL: https://github.com/skx/assembler
- Owner: skx
- License: gpl-2.0
- Created: 2020-10-01T17:41:22.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-12-02T07:14:55.000Z (about 4 years ago)
- Last Synced: 2024-09-29T13:10:52.973Z (3 months ago)
- Topics: assembler, assembly, compiler, golang, x86-64
- Language: Go
- Homepage:
- Size: 94.7 KB
- Stars: 65
- Watchers: 4
- Forks: 11
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![GoDoc](https://img.shields.io/static/v1?label=godoc&message=reference&color=blue)](https://pkg.go.dev/github.com/skx/assembler)
[![Go Report Card](https://goreportcard.com/badge/github.com/skx/assembler)](https://goreportcard.com/report/github.com/skx/assembler)
[![license](https://img.shields.io/github/license/skx/assembler.svg)](https://github.com/skx/assembler/blob/master/LICENSE)* [Assembler](#assembler)
* [Limitations](#limitations)
* [Installation](#installation)
* [Example Usage](#example-usage)
* [Internals](#internals)
* [Adding New Instructions](#adding-new-instructions)
* [Debugging Generated Binaries](#debugging-generated-binaries)
* [Bugs?](#bugs)# Assembler
This repository contains a VERY BASIC x86-64 assembler, which is capable of
reading assembly-language input, and generating a staticly linked ELF binary
output.It is more a proof-of-concept than a useful assembler, but I hope to take it to the state where it can compile the kind of x86-64 assembly I produce in some of my other projects.
Currently the assembler will generate a binary which looks like this:
```
$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV)
statically linked, no section header
```Why? I've written a couple of toy projects that generate assembly language programs, then pass them through an assembler:
* [brainfuck compiler](https://github.com/skx/bfcc/)
* [math compiler](https://github.com/skx/math-compiler/)The code in this repository was born out of the process of experimenting with generating an ELF binary directly. A necessary learning-process.
## Limitations
We don't support anywhere near the complete instruction-set which an assembly language programmer would expect. Currently we support only things like this:
* `add $REG, $REG` + `add $REG, $NUMBER`
* Add a number, or the contents of another register, to a register.
* `call $LABEL`
* See [call.asm](call.asm) for an example.
* `dec $REG`
* Decrement the contents of the specified register.
* We also support indirection, so the following work:
* `inc byte ptr [$REG]`
* `inc word ptr [$REG]`
* `inc dword ptr [$REG]`
* `inc qword ptr [$REG]`
* `inc $REG`
* Increment the contents of the specified register.
* We also support indirection, so the following work:
* `inc byte ptr [$REG]`
* `inc word ptr [$REG]`
* `inc dword ptr [$REG]`
* `inc qword ptr [$REG]`
* `jmp $LABEL`, `je $LABEL`, `jne $LABEL`
* We support jumping instructions, but only with -127/+128 byte displacements
* See [jmp.asm](jmp.asm) for a simple example.
* `mov $REG, $NUMBER`
* `mov $REG, $REG`
* Move a number into the specified register.
* `nop`
* Do nothing.
* `push $NUMBER`, or `push $IDENTIFIER`
* `ret`
* Return from call.
* **NOTE**: We don't actually support making calls, though that can be emulated via `push` - see [jmp.asm](jmp.asm) for an example.
* `sub $REG, $REG` + `sub $REG, $NUMBER`
* Subtract a number, or the contents of another register, from a register.
* `xor $REG, $REG`
* Set the given register to be zero.
* `int $NUM`
* Call the kernel.
* Processor (flag) control instructions:
* `clc`, `cld`, `cli`, `cmc`, `stc`, `std`, and `sti`.Note that we really only support the following registers, you'll see that we only support the 64-bit registers (which means `rax` is supported but `eax`, `ax`, `ah`, and `al` are specifically __not__ supported):
* `rax`
* `rcx`
* `rdx`
* `rbx`
* `rsp`
* `rbp`
* `rsi`
* `rdi`There is _some_ support for the extended registers `r8`-`r15`, but this varies on a per-instruction basis and should not be relied upon.
There is support for storing fixed-data within our program, and locating that. See [hello.asm](hello.asm) for an example of that.
We also have some other (obvious) limitations:
* There is notably no support for comparison instructions, and jumping instructions.
* We _emulate_ (unconditional) jump instructions via "`push`" and "`ret`", see [jmp.asm](jmp.asm) for an example of that.
* The entry-point is __always__ at the beginning of the source.
* You can only reference data AFTER it has been declared.
* These are added to the `data` section of the generated binary, but must be defined first.
* See [hello.asm](hello.asm) for an example of that.## Installation
If you have this repository cloned locally you can build the assembler like so:
cd cmd/assembler
go build .
go install .If you wish to fetch and install via your existing toolchain:
go get -u github.com/skx/assembler/cmd/assembler
You can repeat for the other commands if you wish:
go get -u github.com/skx/assembler/cmd/lexer
go get -u github.com/skx/assembler/cmd/parserOf course these binary-names are very generic, so perhaps better to work locally!
## Example Usage
Build the assembler:
$ cd cmd/assembler
$ go build .Compile the [sample program](test.asm), and execute it showing the return-code:
$ cmd/assembler/assembler test.asm && ./a.out ; echo $?
9Or run the [hello.asm](hello.asm) example:
$ cmd/assembler/assembler hello.in && ./a.out
Hello, world
Goodbye, worldYou'll note that the `\n` character was correctly expanded into a newline.
# Internals
The core of our code consists of a small number of simple packages:
* A simple tokenizer [lexer/lexer.go](lexer/lexer.go)
* A simple parser [parser/parser.go](parser/parser.go)
* This populates a simple internal-form/AST [parser/ast.go](parser/ast.go).
* A simple compiler [compiler/compiler.go](compiler/compiler.go)
* A simple elf-generator [elf/elf.go](elf/elf.go)
* Taken from [vishen/go-x64-executable](https://github.com/vishen/go-x64-executable/).In addition to the package modules we also have a couple of binaries:
* `cmd/lexer`
* Show the output of lexing a program.
* This is useful for debugging and development-purposes, it isn't expected to be useful to end-users.
* `cmd/parser`
* Show the output of parsing a program.
* This is useful for debugging and development-purposes, it isn't expected to be useful to end-users.
* `cmd/assembler`
* Assemble a program, producing an executable binary.These commands located beneath `cmd` each operate the same way. They each take a single argument which is a file containing assembly-language instructions.
For example here is how you'd build and test the parser:
cd cmd/parser
go build .
$ ./parser ../../test.asm
&{{INSTRUCTION xor} [{REGISTER rax} {REGISTER rax}]}
&{{INSTRUCTION inc} [{REGISTER rax}]}
&{{INSTRUCTION mov} [{REGISTER rbx} {NUMBER 0x0000}]}
&{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0007}]}
&{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}
&{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0002}]}
&{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}
&{{INSTRUCTION int} [{NUMBER 0x80}]}## Adding New Instructions
This is how you might add a new instruction to the assembler, for example you might add `jmp 0x00000` or some similar instruction:
* Add a new entry for the instruction in [instructions/instructions.go](instructions/instructions.go)
* i.e. Update `InstructionLengths` map to add the instruction.
* This will be used by both the tokenization process, and the parser.
* Generate the appropriate output in `compiler/compiler.go`, inside the function `compileInstruction`.
* i.e. Emit the binary-code for the instruction.## Debugging Generated Binaries
Launch the binary under gdb:
$ gdb ./a.out
Start it:
(gdb) starti
Starting program: /home/skx/Repos/github.com/skx/assembler/a.outProgram stopped.
0x00000000004000b0 in ?? ()Dissassemble:
(gdb) x/5i $pc
Or show string-contents at an address:
(gdb) x/s 0x400000
# Bugs?
Feel free to report, as this is more a proof of concept rather than a robust tool they are to be expected.
Specifically we're missing support for many instructions, but I hope the code generated for those that is present is correct.
Steve