{"id":15220494,"url":"https://github.com/skx/assembler","last_synced_at":"2025-10-30T06:32:05.190Z","repository":{"id":45457089,"uuid":"300368693","full_name":"skx/assembler","owner":"skx","description":"Basic X86-64 assembler, written in golang","archived":false,"fork":false,"pushed_at":"2020-12-02T07:14:55.000Z","size":97,"stargazers_count":65,"open_issues_count":2,"forks_count":11,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-09-29T13:10:52.973Z","etag":null,"topics":["assembler","assembly","compiler","golang","x86-64"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/skx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-01T17:41:22.000Z","updated_at":"2024-09-23T11:15:42.000Z","dependencies_parsed_at":"2022-07-19T08:29:45.985Z","dependency_job_id":null,"html_url":"https://github.com/skx/assembler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fassembler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fassembler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fassembler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fassembler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/skx","download_url":"https://codeload.github.com/skx/assembler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219856579,"owners_count":16556082,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assembler","assembly","compiler","golang","x86-64"],"created_at":"2024-09-28T13:11:07.829Z","updated_at":"2025-10-30T06:31:59.917Z","avatar_url":"https://github.com/skx.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![GoDoc](https://img.shields.io/static/v1?label=godoc\u0026message=reference\u0026color=blue)](https://pkg.go.dev/github.com/skx/assembler)\n[![Go Report Card](https://goreportcard.com/badge/github.com/skx/assembler)](https://goreportcard.com/report/github.com/skx/assembler)\n[![license](https://img.shields.io/github/license/skx/assembler.svg)](https://github.com/skx/assembler/blob/master/LICENSE)\n\n* [Assembler](#assembler)\n  * [Limitations](#limitations)\n  * [Installation](#installation)\n  * [Example Usage](#example-usage)\n* [Internals](#internals)\n  * [Adding New Instructions](#adding-new-instructions)\n  * [Debugging Generated Binaries](#debugging-generated-binaries)\n* [Bugs?](#bugs)\n\n\n# Assembler\n\nThis repository contains a VERY BASIC x86-64 assembler, which is capable of\nreading assembly-language input, and generating a staticly linked ELF binary\noutput.\n\nIt is more a proof-of-concept than a useful assembler, but I hope to take it to the state where it can compile the kind of x86-64 assembly I produce in some of my other projects.\n\nCurrently the assembler will generate a binary which looks like this:\n\n```\n$ file a.out\na.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV)\n       statically linked, no section header\n```\n\nWhy?  I've written a couple of toy projects that generate assembly language programs, then pass them through an assembler:\n\n* [brainfuck compiler](https://github.com/skx/bfcc/)\n* [math compiler](https://github.com/skx/math-compiler/)\n\nThe code in this repository was born out of the process of experimenting with generating an ELF binary directly.  A necessary learning-process.\n\n\n\n## Limitations\n\nWe don't support anywhere near the complete instruction-set which an assembly language programmer would expect.  Currently we support only things like this:\n\n* `add $REG, $REG` + `add $REG, $NUMBER`\n  * Add a number, or the contents of another register, to a register.\n* `call $LABEL`\n  * See [call.asm](call.asm) for an example.\n* `dec $REG`\n  * Decrement the contents of the specified register.\n  * We also support indirection, so the following work:\n    * `inc byte ptr [$REG]`\n    * `inc word ptr [$REG]`\n    * `inc dword ptr [$REG]`\n    * `inc qword ptr [$REG]`\n* `inc $REG`\n  * Increment the contents of the specified register.\n  * We also support indirection, so the following work:\n    * `inc byte ptr [$REG]`\n    * `inc word ptr [$REG]`\n    * `inc dword ptr [$REG]`\n    * `inc qword ptr [$REG]`\n* `jmp $LABEL`, `je $LABEL`, `jne $LABEL`\n  * We support jumping instructions, but only with -127/+128 byte displacements\n  * See [jmp.asm](jmp.asm) for a simple example.\n* `mov $REG, $NUMBER`\n* `mov $REG, $REG`\n  * Move a number into the specified register.\n* `nop`\n  * Do nothing.\n* `push $NUMBER`, or `push $IDENTIFIER`\n* `ret`\n  * Return from call.\n  * **NOTE**: We don't actually support making calls, though that can be emulated via `push` - see [jmp.asm](jmp.asm) for an example.\n* `sub $REG, $REG` + `sub $REG, $NUMBER`\n  * Subtract a number, or the contents of another register, from a register.\n* `xor $REG, $REG`\n  * Set the given register to be zero.\n* `int $NUM`\n  * Call the kernel.\n* Processor (flag) control instructions:\n  * `clc`, `cld`, `cli`, `cmc`, `stc`, `std`, and `sti`.\n\nNote that we really only support the following registers, you'll see that we only support the 64-bit registers (which means `rax` is supported but `eax`, `ax`, `ah`, and `al` are specifically __not__ supported):\n\n* `rax`\n* `rcx`\n* `rdx`\n* `rbx`\n* `rsp`\n* `rbp`\n* `rsi`\n* `rdi`\n\nThere is _some_ support for the extended registers `r8`-`r15`, but this varies on a per-instruction basis and should not be relied upon.\n\nThere is support for storing fixed-data within our program, and locating that.  See [hello.asm](hello.asm) for an example of that.\n\nWe also have some other (obvious) limitations:\n\n* There is notably no support for comparison instructions, and jumping instructions.\n  * We _emulate_ (unconditional) jump instructions via \"`push`\" and \"`ret`\", see [jmp.asm](jmp.asm) for an example of that.\n* The entry-point is __always__ at the beginning of the source.\n* You can only reference data AFTER it has been declared.\n  * These are added to the `data` section of the generated binary, but must be defined first.\n  * See [hello.asm](hello.asm) for an example of that.\n\n\n\n## Installation\n\nIf you have this repository cloned locally you can build the assembler like so:\n\n    cd cmd/assembler\n    go build .\n    go install .\n\nIf you wish to fetch and install via your existing toolchain:\n\n    go get -u github.com/skx/assembler/cmd/assembler\n\nYou can repeat for the other commands if you wish:\n\n    go get -u github.com/skx/assembler/cmd/lexer\n    go get -u github.com/skx/assembler/cmd/parser\n\nOf course these binary-names are very generic, so perhaps better to work locally!\n\n\n## Example Usage\n\nBuild the assembler:\n\n     $ cd cmd/assembler\n     $ go build .\n\nCompile the [sample program](test.asm), and execute it showing the return-code:\n\n     $ cmd/assembler/assembler test.asm \u0026\u0026 ./a.out ; echo $?\n     9\n\nOr run the [hello.asm](hello.asm) example:\n\n     $ cmd/assembler/assembler  hello.in \u0026\u0026 ./a.out\n     Hello, world\n     Goodbye, world\n\nYou'll note that the `\\n` character was correctly expanded into a newline.\n\n\n# Internals\n\nThe core of our code consists of a small number of simple packages:\n\n* A simple tokenizer [lexer/lexer.go](lexer/lexer.go)\n* A simple parser [parser/parser.go](parser/parser.go)\n  * This populates a simple internal-form/AST [parser/ast.go](parser/ast.go).\n* A simple compiler [compiler/compiler.go](compiler/compiler.go)\n* A simple elf-generator [elf/elf.go](elf/elf.go)\n  * Taken from [vishen/go-x64-executable](https://github.com/vishen/go-x64-executable/).\n\n\nIn addition to the package modules we also have a couple of binaries:\n\n* `cmd/lexer`\n  * Show the output of lexing a program.\n  * This is useful for debugging and development-purposes, it isn't expected to be useful to end-users.\n* `cmd/parser`\n  * Show the output of parsing a program.\n    * This is useful for debugging and development-purposes, it isn't expected to be useful to end-users.\n* `cmd/assembler`\n  * Assemble a program, producing an executable binary.\n\nThese commands located beneath `cmd` each operate the same way.  They each take a single argument which is a file containing assembly-language instructions.\n\nFor example here is how you'd build and test the parser:\n\n    cd cmd/parser\n    go build .\n    $ ./parser ../../test.asm\n    \u0026{{INSTRUCTION xor} [{REGISTER rax} {REGISTER rax}]}\n    \u0026{{INSTRUCTION inc} [{REGISTER rax}]}\n    \u0026{{INSTRUCTION mov} [{REGISTER rbx} {NUMBER 0x0000}]}\n    \u0026{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0007}]}\n    \u0026{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}\n    \u0026{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0002}]}\n    \u0026{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}\n    \u0026{{INSTRUCTION int} [{NUMBER 0x80}]}\n\n\n## Adding New Instructions\n\nThis is how you might add a new instruction to the assembler, for example you might add `jmp 0x00000` or some similar instruction:\n\n* Add a new entry for the instruction in [instructions/instructions.go](instructions/instructions.go)\n  * i.e. Update `InstructionLengths` map to add the instruction.\n  * This will be used by both the tokenization process, and the parser.\n* Generate the appropriate output in `compiler/compiler.go`, inside the function `compileInstruction`.\n  * i.e. Emit the binary-code for the instruction.\n\n\n\n## Debugging Generated Binaries\n\nLaunch the binary under gdb:\n\n    $ gdb ./a.out\n\nStart it:\n\n    (gdb) starti\n    Starting program: /home/skx/Repos/github.com/skx/assembler/a.out\n\n    Program stopped.\n    0x00000000004000b0 in ?? ()\n\nDissassemble:\n\n    (gdb)  x/5i $pc\n\nOr show string-contents at an address:\n\n    (gdb) x/s 0x400000\n\n\n# Bugs?\n\nFeel free to report, as this is more a proof of concept rather than a robust tool they are to be expected.\n\nSpecifically we're missing support for many instructions, but I hope the code generated for those that is present is correct.\n\n\nSteve\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskx%2Fassembler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fskx%2Fassembler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskx%2Fassembler/lists"}