Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lifthrasiir/dcputhings

Assorted Tools for DCPU-16 Development
https://github.com/lifthrasiir/dcputhings

Last synced: about 2 months ago
JSON representation

Assorted Tools for DCPU-16 Development

Awesome Lists containing this project

README

        

# dcupthings

This is the **dcputhings**, assorted tools for DCPU-16 development. This
repository is maintained by [Kang Seonghoon](http://mearie.org/) and contains
the following softwares:

* `dcpu.c` and `dcpuopt.c`, my early attempts to build a DCPU-16 emulator.
* DcpuAsm, an Ocaml DSL for DCPU-16 assembly.

## DcpuAsm

DcpuAsm is a DCPU-16 assembler *embedded in the Ocaml syntax*. It allows
easier DCPU-16 code generation, but it can also be used as an ordinary macro
assembler if you understand a bit of Ocaml.

Basically, the assembly is represented as an Ocaml list:

[
SET A, 4;
SET B, 5;
%loop:
SET PC, %loop;
]

More specifically, `SET A, 4`, `SET B, 5` and `%loop: SET PC, %loop` evaluates
to the internal representation via Camlp4. Note that Ocaml allows the trailing
semicolon in the brackets, so the last `;` is just fine.

This list can be converted to the binary via `ASM` keyword:

let code = ASM [
SET A, 4;
SET B, 5;
%loop:
SET PC, %loop;
];;
print_string (DcpuAsm.to_binary_le code)

This will resolve all the labels to the fixed offset. `to_binary_le` converts
the static code into the little-endian byte string. The big-endian counterpart
is `to_binary_be`, and you can get an array of words using `to_words`.

By default `ASM` assumes the origin at 0x0000. This can be changed using `ASM
~origin:0x1000 [...]` syntax; in fact, `ASM` is just a shorter alias to
`DcpuAsm.asm` function.

### Statements / Instructions

DcpuAsm supports the following instructions (and pseudo-instructions):

* Basic opcodes: `SET`, `ADD`, `SUB`, `MUL`, `MLI`, `DIV`, `DVI`, `MOD`, `MDI`,
`AND`, `BOR`, `XOR`, `SHR`, `ASR`, `SHL`, `IFB`, `IFC`, `IFE`, `IFN`, `IFG`,
`IFA`, `IFL`, `IFU`, `ADX`, `SBX`, `STI`, `STD`
* Special opcodes: `JSR`, `HCF`, `INT`, `IAG`, `IAS`, `IAP`, `IAQ`, `HWN`,
`HWQ`, `HWI`
* Raw data: `DAT`, `ORG`, `ALIGN`
* Syntactic extensions: `NOP`, `JMP`, `PUSH`, `POP`, `RET`, `BRK`, `HLT`
* Empty opcode (i.e. no output at all): `PASS`

Basic opcodes has two arguments, and special opcodes has one of them.
Multiple arguments are separated with `,` as much like other assemblers.

`DAT` has one or more arguments. The argument can be a typical immediate (see
below for the syntax) which occupies exactly one word, or a string which
occupies the same number of words (so that `"ok"` equals to `'o', 'k'`).
One can also use `_` for placeholders which value can be ignored; it is mostly
equivalent to `0` but `_`s at the end of the binary will be ignored.
Arguments can have a `TIMES` prefix as like `DAT 3 TIMES 0x1234`, where the
repeat count can be any expression including labels. Repeating string is also
allowed (e.g. `3 TIMES "hello?"`).

`ORG x` is equivalent to `DAT (x-%_) TIMES _`, and will set the current
assembly position to `x` if possible. It will raise an error if it is
impossible. `x` should be a positive integer.

`ALIGN x` is equivalent to `DAT ((x-(%_ MOD x)) MOD x) TIMES _`, and will set
the current assembly position to the next multiple of `x`. (Therefore it will
add at most `x-1` zeroes.)

`NOP` is equivalent to `SET A, A`. It does nothing but take one cycle. While
DCPU-16 has lots of nops, this encoding is chosen because of the simplicity of
its binary encoding (0x0001). It may change if 0x0000 also turns out to be a
nop.

`JMP a` sets the PC to `a` in the fastest or at least shortest way. There are
4 possible encodings for `JMP`: `SET PC, ...`, `XOR PC, ...`, `AND PC, ...`
and `SUB PC, ...`. (Among them `XOR` is fastest but not applicable for all
cases.) Note that the plain `SET PC, a` won't be optimized; you must
explicitly use `JMP a` instead.

`PUSH a` is equivalent to `SET PUSH, a`. `POP a` is equivalent to `SET a,
POP`. You can also use `[SP]` instead of `PEEK`, and `[SP+...]` instead of `PICK
...`.

`RET` is equivalent to `SET PC, POP`, and used for returning from the
subroutine initiated by `JSR` instruction.

`BRK` and `HLT` are equivalent to `SUB PC, 1`. This forms a simple infinite
loop, and used as a *de facto* instruction to terminate the emulator.

`PASS` does not emit the binary at all; it can be used as a placeholder.

DcpuAsm does not support `EQU` pseudo-instruction or similar; you can use an
ordinary `let x = ... in ...` construct to define constants, however.

### Labels

DcpuAsm supports labels. Labels are a valid Ocaml name (always starts with a
lowercase letter or `_`) prepended by `%`; `%asdf`, `%_foo_bar`, `%loop42`
are valid labels, for example.

There are two ways to use labels:

1. It can occur in the expression, and evaluates to the location pointed by
the label. The instruction may contain labels defined after it.
2. It can also occur at the front of the instruction (e.g. `%foo: SET A, 3`)
to declare the label. The colon (`:`) is optional for predefined
instructions, but you are recommended to keep the colon as it allows
multiple label definitions. Skipping a colon may be natural for `DAT`
instructions however.

You can define labels at the end of list; the `PASS` statement will be
implicitly added:

[
JMP %garbage;
DAT 1, 2, 3, 4;
%garbage:
]

DcpuAsm will automatically resolve labels to the appropriate position. Since
the length of instructions may vary depending on the position of labels,
DcpuAsm runs multiple passes to settle them down. If it is not stabilized
after given number of passes DcpuAsm gives up. The default limit is 50, but
can be configured like `ASM ~maxpass:10 [...]`.

It is possible to have free (undefined) labels in the assembly. DcpuAsm will
make sure that these labels, while unresolved, will not affect the other parts
of generated code. This is done by forcing all remaining immediates to always
use a longer form.

It is advised to prepend `_` to local labels. DcpuAsm has a special support
for these local labels (see below).

The special label `%_`, when used in the expression, resolves to the position
of the current instruction. For example `SET PC, %_` will be same as `%_temp:
SET PC, %_temp`. You cannot define a label named `%_`.

### Expressions

Expression can occur as an instruction's argument. It may contain registers,
numbers, labels, memory references (enclosed in `[]`) and expressions.

DcpuAsm supports all general and special registers: `A`, `B`, `C`, `X`, `Y`,
`Z`, `I`, `J`, `SP`, `PC`, `EX`, `IA`. (`IA` cannot really be used, but it is
there for the better error handling.) It also supports `PUSH`, `PEEK`, `POP` and
`PICK ...`; they cannot be used in the expression.

DcpuAsm supports numbers in base 2 (`0b101`), base 8 (`0o337`), base 10 and
base 16 (`0x1337`) just like Ocaml. Additionally a character literal (`'A'`)
will be equal to its numerical code (i.e. `int_of_char 'A'`). All numbers are
treated as built-in Ocaml numbers (31 or 63 bits long depending on the
platform) so you should be aware of it.

DcpuAsm supports all ordinary arithmetic and bitwise operations: `+`, `-`,
`*`, `DIV`, `MOD`, `NOT`, `AND`, `OR`, `XOR`, `SHL`, `SHR`. While arithmetic
operations are permitted for registers, the resulting expression has to be in
the form `register + other expression` or `[register + other expression]` due
to the constraint of DCPU-16. (The intermediate expression does *not* have to
however: `[3*A+2*(2*B-A)-(8 DIV 2)*B]` will be resolved to `[A]`, which is
perfectly valid in DCPU-16.)

As mentioned before, labels in the expression evaluate to their positions. You
can do something like this:

[
(* Returns sqrt(A) from the precomputed table. A should be less than 16.
* B will contain an integral part and A will contain a fractional part.
*)
%isqrt:
IFG A, (%_fpend-%_fpstart) DIV 2 - 1; (* bound check *)
HLT;
MUL A, 2;
SET B, [%_fpstart+A];
SET A, [%_fpstart+A+1];
JMP POP;
%_fpstart:
DAT 0x0000, 0x0000; DAT 0x0001, 0x0000;
DAT 0x0001, 0x6a0a; DAT 0x0001, 0xbb68;
DAT 0x0002, 0x0000; DAT 0x0002, 0x3c6f;
DAT 0x0002, 0x7312; DAT 0x0002, 0xa550;
DAT 0x0002, 0xd414; DAT 0x0003, 0x0000;
DAT 0x0003, 0x298b; DAT 0x0003, 0x510e;
DAT 0x0003, 0x76cf; DAT 0x0003, 0x9b05;
DAT 0x0003, 0xbddd; DAT 0x0003, 0xdf7c;
%_fpend:
]

`IMM (e)` (note the parentheses) and `PTR [e]` is a longer form of `e` and
`[e]`, respectively, and you should use them outside the assembly instruction.

Normal Ocaml expression can also be used; `VAL e` will evaluate `e` as an
Ocaml expression and use its value as an immediate. Similarly, `STR e` uses
its value as a string (only useful in `DAT` arguments). If the Ocaml
expression is simple enough (e.g. a single identifier) then you can omit
`VAL` entirely. This is very useful for compile-time constants:

let screen_base = 0x8000 in
[
SET [screen_base], 'H';
SET [screen_base+1], 'e';
SET [screen_base+2], 'l';
SET [screen_base+3], 'l';
SET [screen_base+4], 'o';
HLT;
]

DcpuAsm tries to generate the shortest code for given assembly, but you can
override this behavior by `SHORT` and `LONG` prefixes. `SHORT e` will cause an
error when `e` does not fit in the range of -1--30 or it is used as a first
operand (cannot use a short literal there), and `LONG e` will generate a longer
form of given immediate (not the instruction, so should use it twice for basic
opcodes). This only applies to a literal value; it is silently ignored in other
kind of values.

A special value `NEXT` can also be used as a part of an expression. It won't
generate the "next words" required for long literals and register-relative
addressing, so whatever the next instruction is it (or its first word) will be
the next word. It can be used for simple self-modifying programs in combination
with `SHORT` (to ensure that the next instruction is always one word long), for
example. Note that `NEXT` is not canonicalized, so `[A+NEXT*2-NEXT]` (for
example) is invalid. Only a form of `NEXT`, `[reg+NEXT]`, `[NEXT+reg]` is valid.

### Blocks

DcpuAsm supports a block as a unit of assembly instructions. They can be used
as a building block:

(* Warning: not tail-recursive. Illustration purpose only. *)
let copyn src dst n =
if n = 0 then
PASS
else if n = 1 then
SET [dst], [src]
else
BLOCK [
SET [dst], [src];
copyn (src+1) (dst+1) (n-1);
]
in
[
(* save and restore the video memory *)
copyn 0x8000 0x4000 (16*32);
copyn 0x4000 0x8000 (16*32);
HLT;
]

`BLOCK e` evaluates `e` as an *Ocaml* expression (which includes,
incidentally, a list containing assembly instructions) and makes an
instruction block out of it. You don't have use blocks if you have exactly one
instruction, as the case `n = 1` of the above code suggests.

More interesting use of blocks involves local labels:

let case k = BLOCK [ (* ... *) ] in
[
BLOCK LOCAL [
IFE A, 1;
JMP %_next;
JMP %_skip;
%_next:
case 1;
%_skip:
];
BLOCK LOCAL [
IFE A, 2;
JMP %_next;
JMP %_skip;
%_next:
case 2;
%_skip:
];
HLT;
]

`BLOCK LOCAL` will make all defined labels starting with `_` local. It is not
possible to access these local labels outside of the block (unless you use a
nasty hack). Non-local blocks (in this case, `case 1` and `case 2`) will not
affect this procedure. This is very useful for generated codes.

You can manually give a list of local labels using `BLOCK LOCAL %a, %b, %c`
syntax; or by an Ocaml list using `BLOCK LOCAL *["a"; "b"; "c"]`.
`DcpuAsmExample.ml` contains some extreme example of local blocks.

### Macros

There are no separate macro feature in DcpuAsm, but you can trivially make a
simple macro with local blocks and Ocaml `let` construct.

DcpuAsm *does* support some additional features for macros. While `VAL` and
`STR` allows the insertion of arbitrary immediate value or string, you cannot
insert registers or other expression in this way. Therefore DcpuAsm supports a
quoted form `#e` of an Ocaml expression, which can appear in:

* Expressions (e.g. `3 + #reg`). The expression should evaluate to the
internal representation of expression; an immediate should be quoted using
`IMM` prefix.
* Labels (e.g. `%#labelname`). The expression should evaluate to a string.
While you can use any character in the label name (even an empty string is
permitted), you should restrict yourself to the normal identifier. Local
labels, for example, start with `.` character internally. `DcpuAsm.gensym`
function can be used to generate unique symbols.

### Caveats / To-do List

As always you should expect the following caveats:

* You cannot use codes like `[(%label1, ...); (%label2, ...); ...]` in the
normal Ocaml code because `(%` will be treated as one token. (`( %label1`
etc. will work.) The assembly syntax is specially crafted to separate those
two, however. Any suggestions about this problem are welcomed.
* `DAT` is missing `*`-prefixed items.