Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/owl-from-hogvarts/csa-lab3

Self-made assembler and Accumulator CPU
https://github.com/owl-from-hogvarts/csa-lab3

assembly cpu emulator

Last synced: 5 days ago
JSON representation

Self-made assembler and Accumulator CPU

Awesome Lists containing this project

README

        

# Thanks

Special thanks to [Zerumi Coder](https://github.com/zerumi) whose purest soul helped me on this long journey! The Guy who supported me on every tricky junction of the long night road. The Guy, who lit the way for me, who has enlighten me!

Thank you, Zerumi!

Thanks to [Lannee](https://github.com/lannee) the Great, who supported the Magnificent Order of Rust!

Thanks to [Local Piper](https://github.com/localPiper/) the Funniest, who didn't let me give up! His majesty supported me with marvelous jokes!

Thanks to all other fellow comrades who stood with me, who fought bravely, who has won this battle!

# Variant

- Тернавский Константин Евгеньевич. P3206

```
asm | acc | neum | mc -> hw | tick -> instr | struct | stream | port | pstr | prob2 | cache
```

- Базовый вариант

## Transcript

- assembler
- accumulator
- Von Neumann (same memory for commands and data)
- Microcode
- accurate up to tick
- code is stored as High-Level structure
- Stream IO (no interrupts)
- IO devices addressed by ports. Separate IO instruction
- Pascal strings (length + content)
- Prob 2. Even Fibonacci numbers
- Cache (***not implemented***)

# Table of content

- [Thanks](#thanks)
- [Variant](#variant)
- [Transcript](#transcript)
- [Table of content](#table-of-content)
- [Language](#language)
- [Syntax](#syntax)
- [Semantics](#semantics)
- [Literals](#literals)
- [Argument types](#argument-types)
- [Assembler directives](#assembler-directives)
- [ISA](#isa)
- [Instruction format](#instruction-format)
- [Instruction pipeline](#instruction-pipeline)
- [Operand types](#operand-types)
- [Memory](#memory)
- [CPU Architecture](#cpu-architecture)
- [Data path](#data-path)
- [Registers](#registers)
- [ALU](#alu)
- [Control unit](#control-unit)
- [Stats](#stats)

# Language

## Syntax

> `number` is 16-bit number
> `number(n)` is n-bits number

Assembly:
```ebnf
program ::= lines

lines ::= line | line lines

line ::= new_line | statement new_line

statement ::= item | label item

label ::= word ":"

item ::= empty | command | directive

// command names are matched case-insensitive
command ::= command_none
| command_address
| command_immediate
| command_port

command_none ::= "inc"
| "shift_left"
| "shift_right"
| "nop"
| "halt"

command_address ::= opcode_address address

opcode_address ::= "load"
| "store"
| "add"
| "and"
| "cmp"
| "jzc"
| "jzs"
| "jz" // alias for jzs
| "jcc"
| "jcs"
| "jc" // alias for jcs
| "jump"

address ::= address_relative
| address_absolute
| address_indirect

address_relative ::= actual_address

address_absolute ::= "!" actual_address

address_indirect ::= "(" actual_address ")"

actual_address ::= word | number

command_immediate ::= "andi" number

command_port ::= opcode_port port

opcode_port ::= "in" | "out"

port ::= number(8)

// supports numbers:
// decimal: 145, 0001
// hex: 0xaf
// bin: 0b010101
number ::= "^(?P0[xb])?(?P[\dabcdef_]+)"

directive ::= directive_word | directive_org

directive_word ::= "word" word_arguments

word_arguments ::= word_argument | word_argument word_arguments

word_argument ::= number(32) | label

directive_org ::= "org" number

```

> Note: space symbols are not considered and are skipped
> Space symbols are defined as following set: " "

## Semantics

- strategy of computation: *sequential*
- label's scope: *global*

## Literals

For number endian see [Memory](#memory) section.

> We use notion of number types similar to rust's one:
> first letter says the type and affects sign extension:
> - `u` no sign extension happens. Value treated as **U**nsigned
>
> The letter is followed by a number. The number signifies amount of bits
> available to represent a number.
>
> Examples:
> - `u32` - 32-bit unsigned number
> - `u16` - 16-bit unsigned number

All literals are numbers with different range of values. All numbers have the same syntax. Range is determined based on usage context: command's argument type defines literal meaning. For details see command's argument types table.

Number literals support following prefixes:
- `0x` - for *hex* numbers
- `0b` - for *binary* numbers

Example:
```
decimal: 145, 0001
hex: 0xaf
bin: 0b010101
```


Numbers without prefix are treated as *decimals*.

Numbers *may* contain any amount of `_` at any point after prefix (or anywhere if there is no prefix):

Example:

- `4000`
- `4_000`
- `4_____000`
- `___4_000___`
- `0x____fa0`

are all mean `4000` decimal value.

## Argument types

- `none` - command requires no arguments. Placing anything would result into error.
- `port` - command requires single number which denotes IO device's address. Number is treated as `u8`
- `immediate` - command requires single number. Number is treated as `u16`
- `label` - special argument type. It requires single word. Word is a sequence of unicode letters. It may contain any number of `_` in any position. Used within composite type `address` and with [`word` directive](#assembler-directives).
- `address` - composite type. see notes and table below
- There are other types. They are special and used in conjunction with [Assembly directives](#assembler-directives)

`Address` type is either number treated as `u16` or label. Address allows modifiers to switch addressing mode.

In the following table, strings enclosed with `""` means literal characters present in source code. `|` means *alternative*. `()` are used to *group items*.

That is `"!" (u16 | label)` means exclamation mark followed by either number literal or label where number literal is interpreted as 16-bit unsigned number

Each addressing in this table corresponds to appropriate [addressing mode of CPU](#addressing-modes-argument-types).



Mode
Syntax
Example




Relative
u16 | label

load 0x55
load 145
load some_label



Absolute
"!" (u16 | label)
store !0x55
store !some_label



Indirect
"(" u16 | label ")"
load (some_label_ptr)
load (0x55)


## Assembler directives

- `data` - special argument type used with assembler directive `word`. It requires *one or **more** numbers* each seperated with at least one space. Each number is treated as `u32`. That is, although, value `0xff` perfectly fits into one byte, `word 0xff` occupies whole [memory cell](). *Notice, that no sign extension takes place! Value is placed as is*:
`0x00_00_00_ff`

Assembler directives are not represented in CPU's memory. They are special commands intended to assist you write assembly code.



Directive
Allowed argument types
Comment




word
data | label
places numbers into memory as is. can be used with labels to make pointers. I.e. word some_label places an address and NOT the content at the address which some_labels refers to.


org
u16
instructs assembler where to place next code item
(be it a raw value or a command) into cell with address ADDRESS.
Subsequent code items will be placed after ADDRESS one by one

Example:

```asm
org 0x04f
VAR1: word 0x45a9 0xff
add 0xf
cmp VAR1
```
Here `0x0000_45a9` will be placed in memory as is at address `0x04f`; `0x0000_00ff` at `0x0050`. `add 0xf` at `0x0051` and so on

# ISA

This section describes opcodes and operand type suggested for use with them.

Notice, that every command may theoretically work with every operand type except `none` operand type, though that has not been tested and may lead to undefined behaviour.

Providing `none` operand type for command which requires any other type results in CPU panicking.

Commands which suggest `none` operand type will simply ignore any operand, although operand fetch would still be executed. So that you *can* but you *should not* specify any operand type except `none` for commands which except `none`.

For this table let's introduce notion of special operand type `operand`. It *requires* operand type to be any of `Relative|Indirect|Immediate|Absolute`. Notice that `none` is ***forbidden*** for this speical type.

```
IN immediate - read data from IO device
OUT immediate - write data to IO device

LOAD operand - load value into accumulator
STORE operand - store value from accumulator into memory cell

ADD operand - well... add a number?
INC none - add 1 to accumulator

// (to check for even values by applying 0x1 mask)
AND operand
CMP operand - subtract number from accumulator without
storing result anywhere. Sets status flags.
Useful for branching
SHIFT_LEFT none
SHIFT_RIGHT none

JZC operand - Jump if Zero Clear
JZS operand - Jump if Zero Set
JCC operand - Jump if Carry Clear
JCS operand - Jump if Carry Set
JUMP operand - Unconditional jump

NOP none - does nothing
HALT none - Stops the simulation
```

Assembler supports variants of some instructions with immediate argument. Namely:
```
AND -> ANDI // useful for masking
```

For more, please, see [syntax](#syntax) section.

## Instruction format

Every instruction occupies exactly one [memory cell](#memory).

All instruction if would be represented in binary has following format:



opcode
argument type
argument




1 byte
1 byte
2 bytes


4 bytes

## Instruction pipeline

1) Instruction fetch
1) fetches instruction from memory to cmd register
2) operand decode
1) determines type of operand
2) load operand
3) execution
1) determines microinstruction number by opcode
2) execute instruction

## Operand types

```
- None: jumps straight to command execution
- Absolute: operand -> address -> [mem] -> data
- Relative: pc + operand -> address -> [mem] -> data
- Indirect: pc + operand -> address -> [mem] -> data -> address -> data
- Immediate: operand -> data

```

# Memory

This CPU uses von Neumann memory model: both data and code are stored in the same memory.

> You neither can interpret data as instruction nor instruction as data though: limitation of current implementation.

Memory consists of `2**16` memory cells. Each memory cell holds either a single 32-bit *big-endian* number *without* sign extension or an instruction.

The whole memory is addressable by `u16` address on per memory cell basis. That is by referring to `0x0000` you can fetch an `u32` number or an instruction.

Address space starts from ***zero***.

| address | content |
| -------- | ------------- |
| `0x0000` | `0x0000_00ff` |
| `0x0001` | `0xdead_beaf` |
| ... | ... |
| `0xffff` | `0x0000_0000` |

# CPU Architecture

## Data path

### Registers

Registers support either `u32` or `u16` values. If you attempt to write `u32` value into `u16`-capable destination (that is either register or ALU) then sixteen most-significant bits are silently discarded.

If you attempt to write `u16` value into `u32`-capable destination, then value is zero-extended to 32 bits.

> *Notice that NO sign-extension takes place!*


- accumulator (`u32`) (least-significant byte is connected to IO)
- data (connected to memory) (MemoryItem: `u32` | [`Command`](./isa/src/lib.rs))
- status (zero, carry)
- address (`u16`)
- program counter (`u16`)
- cmd(opcode, opcode_type, arg: `u16`)

### ALU

ALU operates on two `u32` values, outputting `u32` result and optionally setting `status` register with `zero` and `carry` flags.

![](./images/data_path.svg)

## Control unit

![](./images/control_unit.svg)

# Stats

```
| Full name | alg | loc | bytes | instr | exec_instr | tick | variant |
|Тернавский Константин Евгеньевич | hello_world | 35 | - | 13 | 105 | 647 | asm | acc | neum | mc -> hw | tick -> instr | struct | stream | port | pstr | prob2 | cache|
|Тернавский Константин Евгеньевич | hello_username | 168 | - | 81 | 434 | 2706 | asm | acc | neum | mc -> hw | tick -> instr | struct | stream | port | pstr | prob2 | cache|
|Тернавский Константин Евгеньевич | prob2 | 137 | - | 48 | 630 | 3896 | asm | acc | neum | mc -> hw | tick -> instr | struct | stream | port | pstr | prob2 | cache|
```