{"id":21072315,"url":"https://github.com/g0mb4/tas","last_synced_at":"2026-04-22T08:31:28.013Z","repository":{"id":148469638,"uuid":"258236794","full_name":"g0mb4/tas","owner":"g0mb4","description":"Toy two pass assembler in C.","archived":false,"fork":false,"pushed_at":"2022-12-13T08:56:26.000Z","size":1247,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-14T03:11:07.594Z","etag":null,"topics":["educational","two-pass-assembler"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/g0mb4.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-23T14:45:33.000Z","updated_at":"2022-12-13T08:56:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"b2e2414c-08db-4e61-b55d-23bb6bea3b0b","html_url":"https://github.com/g0mb4/tas","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/g0mb4/tas","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0mb4%2Ftas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0mb4%2Ftas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0mb4%2Ftas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0mb4%2Ftas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/g0mb4","download_url":"https://codeload.github.com/g0mb4/tas/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0mb4%2Ftas/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32127811,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-22T07:37:52.372Z","status":"ssl_error","status_checked_at":"2026-04-22T07:37:51.635Z","response_time":58,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["educational","two-pass-assembler"],"created_at":"2024-11-19T18:56:08.961Z","updated_at":"2026-04-22T08:31:27.996Z","avatar_url":"https://github.com/g0mb4.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"Toy Two Pass Assembler\n======================\nThe project is based on Yonatan Zilpa's excersie. A brief explanation can be found [here](https://www.magmath.com/english/programming/c_programming_language/projects/two_pass_assembler.php).\nThe majority of the following is just a direct copy of that site.\nSome differences:\n+ Hexadecimal (base 16) numeric system is used instead of octal.\n+ .entry MAIN needed to be defined explicitly.\n+ No relative addressing.\n+ The generated object code file uses a different format (it includes the entries and the externals).\n\nDocumentation can be found [here](https://g0mb4.github.io/tas/).\n\nA working virtual machine is created for this project, in orded to run the assembled programs. The machine can be found here: [tvm](https://github.com/g0mb4/tvm). \n\n# \"Hardware\"\nOur computer architecture consists from Central Processing Unit (CPU), registers and Random Access Memory (RAM), where part of the memory is being used as a stack. The size of each word in memory is 16 bits. Arithmetics is to be carried by the '2's complement' method. Our computer machine can only handle integers (Positives or negatives), it doesn't handle real numbers.\n\n## Registers\nOur computer machine includes the following list of registers:\n+ Eight general registers (**r0**, **r1**, **r2**, **r3**, **r4**, **r5**, **r6**, **r7**)\n+ One Program Counter register (**pc**).\n+ One Stack Pointer register (**sp**).\n+ One Status register (**psw** - Program Status Word) which has two flags: **carry flag** and **zero flag**.\n\nAll registers are 16 bits in size.\nThe two first bits of the PSW register are C and Z in correspondence\nCharacters are coded in ASCII.\n\n## Memory\nThe size of memory is 2000 words (each word is 16 bits in size).\n\n## Stack\nThe stack is in the end of the main memory, starts at memory address 1999 (07cf hex)(in words) and it can grow downwards. The size of the stack is 16 words.\n\n## Initialization\nOn startup the all regsiters have a value of zero, including the flags. The contents of the memory is also zero.\n\n# Instructions\nIn our computer machine, instruction is a word (16 bits in size) that carries information about the operator and operands. Although instruction is a string of 16 bits, it can be divided into fields. The following table provides further information about the instruction. The bits are in decimal number system.\n\n| Fields | Operation | Source Operand            ||  Destination Operand      ||\n| ------ | --------- | --------------- | ---------| ----------------| -------- |\n|        |           | Addressing Mode | Register | Addressing Mode | Register |\n| Bits   | 15-12     | 11-9            | 8-6      | 5-3             | 2-0      |\n\nThe following table maps operator's name to its corresponding instruction code (opcode).\n\n| Operator     | Opcode |\n| ------------ | ------ |\n|  ``` mov ``` | 0 \t    |\n|  ``` cmp ``` | 1\t    |\n|  ``` add ``` | 2\t    |\n|  ``` sub ``` | 3\t    |\n|  ``` mul ``` | 4\t    |\n|  ``` div ``` | 5\t    |\n|  ``` lea ``` | 6\t    |\n|  ``` inc ``` | 7\t    |\n|  ``` dec ``` | 8\t    |\n|  ``` jnz ``` | 9\t    |\n|  ``` jnc ``` | a\t    |\n|  ``` shl ``` | b\t    |\n|  ``` prn ``` | c\t    |\n|  ``` jsr ``` | d\t    |\n|  ``` rts ``` | e\t    |\n|  ``` hlt ``` | f\t    |\n\nAll operators are written in lower case letters, details on the meaning of these operators will be specified later.\n\n+ **Bits 9-11**: This field refers to the addressing mode of the source operand. Depending on the value of this field (numeric values of bits 9-11) , the instruction may refer to additional word (first additional word)\n+ **Bits 6-8**: This field refers to the register of the source operand. The field (bits 6-8) maps its numeric value n to register rn.\n\n   *Notice*: If the addressing mode in the source operand does not require the source register, then the source register field are not in use. In such a case the numeric value of the field (bits 6-8) is equal to zero.\n+ **Bits 3-5**: This field refers to the addressing mode of the destination operand. Depending on the numeric value of this field (bits 3-5) , the instruction may refer to additional word (second additional word)\n+ **Bits 0-2**: This field refers to the register of the destination operand. The field (bits 0-2) maps its numeric value n to register rn.\n\n   *Notice*: If the addressing mode in the destination operand does not require the destination register, then the source register field are not in use. In such a case the numeric value of the field (bits 6-8) is equal to zero.\n\nThere are six types of addressing modes in our assembly language, some of these modes require additional information, i.e. additional word. The following table provides information on all types of addressing mode.\n\n| First Word                                        ||| Additional Word | Operand | Way of Writing | \u0026nbsp;\u0026nbsp;Example\u0026nbsp;\u0026nbsp; |\n| ----------- | ------------------ | ---------------- | --------------- | ------- | -------------- | ------- |\n| Field Value | Name               | Register         |                 |         |                |         |\n| 0 | Instant addressing | zero (not in use) | yes | The numeric value of the operand is determined by the numeric value of the additional word. | The operand is a number preceded by the '#' sign. | ``` mov #-1,r2 ``` |\n| 1 | Direct addressing | zero (not in use) | yes |\tThe additional word contains memory address. The numeric value of the operand is the value of this address. | The operand is a label, either declared or expected to be declared later in the file.\t| ``` mov x,r2 ``` |\n| 2 | Indirect addressing | zero (not in use) | yes | The numeric value of the additional word contains memory address. The value of this address is also a memory address. The value of the second address is the numeric value of the operand. | Indirect addressing is indicated by the '@' sign which appeared just before the label. The label is declared in the same way as in the direct addressing mode. | ``` mov @x,r2 ``` |\n| 3 | Direct register addressing | n (positive integer) | no | Register rn contains the value of the operand. | The operand is a legal register name. | ``` mov r1,r2 ``` |\n| 4 | Indirect register addressing | n (positive integer)) | no | Register rn contains information on memory address. This memory address contains the operand. | The operand is a legal register name indicated by the '@' sign. | ``` mov @r1,r2 ``` |\n\n## Machine Instruction Characterization\nMachine instruction may be classified into three different classes (according to the number of operands appeared in each instruction).\n\n## First Class of Operators\nThe first class contains all machine instructions that get two operands. Any machine instruction that belongs to this class may contain one of the following operators:\n```\n        mov, cmp, add, sub, mul, div, lea, shl\n```\nThe following table provides further explanation on the operational aspects of these operators:\n\n| Numeric Code | Operator | Description | \u0026nbsp;\u0026nbsp;Example\u0026nbsp;\u0026nbsp;| Example Description |\n| ------------ | -------- | ----------- | ------- | ------------------- |\n| 0 | ``` mov ``` | Copies the value of the source operand (the first operand) to the destination operand (the second operand).\t|  ``` mov A, r1 ``` | Copy the value of A to register r1. |\n| 1\t| ``` cmp ``` |\tCompare between two operands. The cmp operator subtracts the destination operand from the source operand, without saving the subtraction result, it then updates the zero flag, flag z, in the status register, PSW. | ``` cmp A, r1 ``` | If the values of A and r1 are equal, then the zero flag A, in the status register PSW, is turned on. Else the zero flag is turned off. |\n| 2 | ``` add ``` |\tThe destination operand is assigned with the value of the source operand plus the value of the destination operand. | ``` add A, r0 ``` | Register r0 gets the sum of r0 and A. |\n| 3\t| ``` sub ``` |\tThe destination operand is assigned with the value of the destination operand minus the value of the source operand. |  ``` sub #3, r1 ``` | Register r1 is assigned with the value of r1 minus 3. |\n| 4 | ``` mul ``` | Destination operand assigned with the value of the source operand times the value of destination operand |  ``` mul A, r2 ``` | Register r2 assigned with A times r2. |\n| 5 | ``` div ``` |\tDestination operand is assigned with the value of destination operand divided by the source operand. | ``` div A, r2 ``` | Register r2 assigned with r2/A. |\n| 6 | ``` lea ``` |\tAcronym for 'load effective address'. This operation loads memory address, marked with the label appeared in the first operand to the destination operand. |  ``` lea ABC, r1  ``` | The memory address of label ABC is assigned to register r1. |\n| b | ``` shl ``` | Shift bits to the left in the source operand. The number of shifts is determined by the value of the destination operand. |  ``` shl r1, #1 ``` | Register r1 is shifted 1 bit to the left.\n\n## Second Class of Operators\nThe second class contains all machine instructions that gets one operand. In such cases there is no source operand, thus bits 6-11 are meaningless (their values is zero). Any machine instruction in this class may contain one of the following instruction:\n```\n        inc, dec, jnz, jnc, prn, jsr\n```\nThe following table provides further explanation on the operational aspects of these operators:\n\n| Numeric Code | Operator | Description | \u0026nbsp;\u0026nbsp;Example\u0026nbsp;\u0026nbsp; | Example Description |\n| ------------ | -------- | ----------- | ------- | ------------------- |\n| 7 | ``` inc ``` | The operand is increased by one. |  ``` inc r2 ``` | Register r2 is assigned with r2 plus 1. |\n| 8 | ``` dec ``` |\tThe operand is decreased by one. |  ``` dec r2 ``` | Register r2 is assigned with r2 minus 1. |\n| 9 | ``` jnz ``` |\tAcronym: jump if not zero. The Program Counter register PC is assigned with the source operand if the Z flag, in the Program Status Word register PSW is not zero. |  ``` jnz LINE ``` | If the Z flag (in the PSW register) is not zero, then PC register is assigned with LINE. |\n| a | ``` jnc ``` |\tAcronym: jump if not carry. The Program Counter register PC is assigned with zero if the C flag, in the Program Status Word register PSW is not 0. |  ``` jnc LINE ``` | If the C flag (in the PSW register) is not zero, then PC register is assigned with LINE. |\n| c | ``` prn ``` |\tPrints the ASCII equivalent of the operand to the standard output file (stdout). |  ``` prn r1 ``` | The ASCII equivalent character of the value stored in r1 is printed to standard file. |\n| d | ``` jsr ``` | Calls a subroutine that pushes register PC to the running time stack and assign the operand to the Program Counter register PC.\t|  ``` jsr FUNC ``` | stack[SP] = PC\u003cbr\u003e SP = SP-1 \u003cbr\u003e PC = FUNC |\n\n## Third Class of Operators\nThe third class contains all machine instructions that gets no operands. In such cases bits 0-11 are meaningless (their values is zero). Any machine instruction in this class may contain one of the following instruction:\n```\n        rts, hlt\n```\nThe following table provides further explanation on the operational aspects of these operators:\n\n| Numeric Code | Operator | Description | \u0026nbsp;\u0026nbsp;Example\u0026nbsp;\u0026nbsp; | Example Description |\n| ------------ | -------- | ----------- | ------- | ------------------- |\n| e | ``` rts ``` | Pops a value from the running time stack and move this value to the Program Counter register. |  ``` rts ``` | SP = SP+1 \u003cbr\u003e PC = stack[SP] |\n| f | ``` hlt ``` | Halts the program. | ``` hlt  ``` | Halting the program.\n\n## Legal addressing modes\nThe following table contains information on legal addressing mode for the source and destination operands.\n\n| Operator  | Legal Addressing Modes for the Source Operand | Legal Addressing Modes for the Destination Operand |\n| --------- | --------------------------------------------- | -------------------------------------------------- |\n| ```mov``` | 0,1,2,3,4\t                                | 1,2,3,4                                          |\n| ```cmp``` | 0,1,2,3,4\t                                | 0,1,2,3,4                                        |\n| ```add``` | 0,1,2,3,4\t                                | 1,2,3,4                                          |\n| ```sub``` | 0,1,2,3,4\t                                | 1,2,3,4                                          |\n| ```mul``` | 0,1,2,3,4\t                                | 1,2,3,4                                          |\n| ```div``` | 0,1,2,3,4\t                                | 1,2,3,4                                          |\n| ```lea``` | 1\t                                            | 1,2,3,4                                          |\n| ```inc``` | No source operand\t                            | 1,2,3,4                                          |\n| ```dec``` | No source operand\t                            | 1,2,3,4                                          |\n| ```jnz``` | No source operand\t                            | 1,2,4                                            |\n| ```jnc``` | No source operand\t                            | 1,2,4                                            |\n| ```shl``` | 1,2,3,4\t                                    | 0,1,2,3,4                                        |\n| ```prn``` | No source operand\t                            | 0,1,2,3,4                                        |\n| ```jsr``` | No source operand\t                            | 1,2,4                                            |\n| ```rts``` | No source operand\t                            | No source operand                                  |\n| ```hlt``` | No source operand\t                            | No source operand                                  |\n\n## Flags\nThe following table contains information on the flags modified by the instructions.\n\n| Operator     | Zero Flag Modified | Carry Flag Modified |\n| ------------ | ------------------ | ------------------- |\n|  ``` mov ``` | No           \t    | No                  |\n|  ``` cmp ``` | Yes            \t| No                  |\n|  ``` add ``` | Yes            \t| Yes                 |\n|  ``` sub ``` | Yes            \t| Yes                 |\n|  ``` mul ``` | Yes            \t| Yes                 |\n|  ``` div ``` | No             \t| No                  |\n|  ``` lea ``` | No             \t| No                  |\n|  ``` inc ``` | Yes            \t| No                  |\n|  ``` dec ``` | Yes            \t| No                  |\n|  ``` jnz ``` | No            \t    | No                  |\n|  ``` jnc ``` | No            \t    | No                  |\n|  ``` shl ``` | Yes            \t| Yes                 |\n|  ``` prn ``` | No            \t    | No                  |\n|  ``` jsr ``` | No            \t    | No                  |\n|  ``` rts ``` | No            \t    | No                  |\n|  ``` hlt ``` | No            \t    | No                  |\n\n# Statements\nOur assembly language is consisted of statements separated by the new line character '\\\\n'. When we look into a file it appeared to be made out of lines of statements, each statement appeared in its own line.\nOur assembly language has four types of statements. These statements described in the following table.\n\n| Type of statement | General Explanation |\n| ----------------- | ------------------- |\n| Empty Statement | Line with this kind of statement may contains only white spaces: tab character '\\\\t' or space character ' ' |\n| Comment Statement\t| The first character in a line with this statement is the semicolon ';' character. This line should be completely ignored by the assembler. |\n| Declarative Statement\t| This statement is a directive to the assembler program. It does not generate machine instruction. |\n| Operation Statement | This statement generates machine instruction that needs to be executed by the CPU. The statement represent machine instruction in symbolic form. |\n\n## Directive Statement\nDirective statement is of the following form:\nDirective statement may optionally start with a label, the label has to follow certain syntax rules (to be described later). Directive can start with or without a label, in any case a directive name, preceded by a dot '.' character, must be included. NO whitespace allowed between the '.' character and the directive name. If the directive does include a label, then at least one whitespace character is separating between the label and the '.' character. Following the directive name, whitespace-separated, appearing, in the same line, the directive parameters (the number of parameters is determined by the type of the directive). As mentioned, directive statement may include four types of directive:\n\n1. .data\n\n    The parameter(s) of data is a list of legal numbers separated by a comma ',' character. For example:\n```\n.data    +7,-57 ,17   ,    9\n```\n    Notice that any number of whitespace characters may appear between the number(s) and the comma character(s). However, the comma character must separate between two numeric values.\n    The '.data' directive statement directs the assembler to allocate space in its data image where the appropriate numeric parameters is to be stored. It also direct the assembler to advance the data counter by the number of parameters (of the '.data' directive). If the '.data' directive has a label name, then this label name is assigned with the value in the data image (before it was advanced) and get inserted to the symbols table. This way we can refer to certain place in the data image using the label name. For instance, if we write\n```\nXYZ:    .data   +7,-57,17,9\n    mov \tXYZ, r1\n```\n    then register r1 is assigned with the value +7. If we continue to write\n```\nlea    XYZ, r1\n```\n    then r1 would have been assigned with the address (in the data image) that stores the +7 value.\n\n2. .string\n\n    The '.string' directive statement gets only one legal string as parameter. The meaning of '.string' directive statement is similar to the '.data' directive statement. The ASCII characters composed the string are coded to their appropriate numeric ASCII values) and get inserted to the data image by their order. At the end a zero value is being inserted, to mark the end of the string. The value of the data counter is to be increase, according to the length of the string + one. If the line includes a label name, then the value of the label name is going to point to the location in memory that stores the ASCII code of the first character of the string, at the same way as it was done for the '.data' string. For instance the directive statement\n```\nABC:    .string    \"abcdef\"\n```\n    is going to allocate an array of characters of length 7 starting from the address stored in the ABC label name. This \"array\" is initialized to the ASCII value of characters 'a', 'b', 'c', 'd', 'e', 'f' in correspondence, the array is to be ended with the zero value concatenate to the end of the array.\n\n3. .entry\n\n    The '.entry' directive statement gets one parameter only. This parameter is a label name, declared by other directive statement in the very same file where the The purpose of the '.entry' directive statement is to deal handle cases where a label name defined in an assembly source file A needs to be referred by other assembly source file(s) B, C, D, etc. In this case the '.entry' directive statement, written in the file A, gets the label name as its parameter (the '.entry' directive statement has to have a single parameter). For instance, if an assembly source file A contains the following lines\n```\n.entry\tHELLO\nHELLO:  add\t\t#1, r1\n```\n    then other assembly source file(s), may refer to HELLO label name. Notice that a label at the beginning of the '.entry' directive is meaningless.\n\n4. .extern\n\n    The '.extern' directive statement gets one parameter this parameter is the name of a label name defined in other assembly source file. The purpose of this directive statement is to declare that the label has been defined in other source file and that this assembly source file (the one that contains the '.extern' directive statement) is using it. The correspondence between the value of the label, as appeared in the source file where it was defined, and the operation instruction(s) that are using it as an argument is to be done at linking time.\n```\n.extern HELLO\n```\n    Notice that a label at the beginning of the '.extern' directive is meaningless.\n\n## Operation Statement\nOperation statement is composed from the following:\n\n1. Optional label.\n\n2. Operation name.\n\n3. Operands (the number of operands may be 0, 1 or 2 depending on the operation).\n\nThe length of a statement (of any type) cannot exceed 80 characters.\nThe name of the operation is to be written in lower case letter, operation name can be one of the 16 operations mentioned above.\nAfter the operation name, separated with whitespace character(s), one or two operands may appear. In the case of two operands, the operands are separated with a comma ',' character. As mentioned before, whitespace character(s) may separate the comma and the operands. Operation statement with two operands has the following form:\n\n| Label           | Operation   | Operands                ||\n| --------------- | ----------- | ---------- | ----------- |\n|                 |             | Source     | Destination |\n| ``` HELLO: ```  | ``` add ``` | ``` r7,``` | ``` B ```   |\n| ``` JUMP: ```   | ``` jnc ``` |            | ``` XYZ ``` |\n| ``` END: ```    | ``` hlt ``` |            |             |\n\n# Formal Definitions\n\n## Label\nEvery label must begin with an upper or lower case letter, the rest of the label may contain letters or numbers. The length of the label cannot exceed 30 characters. The label ends with a column ':' character. The column character is not part of the label name it is just a sign representing the end of the character. The label must begin with the first column of the line. Label name cannot have more than one definition. The following labels are written correctly.\n```\n        hEllo:\n        x:\n        He78940:\n```\nLabel name cannot be the same as register or operation name.\nThe label derived its value from the syntax. Label written at the beginning of '.data' or '.string' directive gets the value of the appropriate data counter. Label written at the beginning of an operation statement gets the value of the appropriate operation counter.\n\n## Number\nNumber is a string of decimal digits (0-9) that may optionally be preceded by either '-' or '+' sign. The number gets its value from its decimal representation represented by the string of digits. For instance the numbers\n```\n        76, -5, +123\n```\ncan be accepted as numbers. As mentioned, we do not handle rational or real numbers, only integers.\n\n## String\nString is a sequence of visible ASCII characters surrounded by double quotation marks. The quotation marks are not part of the string. The string\n```\n        \"Hello World\"\n```\nis an example for legal string.\n\n# Two Pass Assembler\nWhen the assembler is starting to translate code it needs to carry two major assignments. Its first assignment is to identify and translate the operation code and its second assignment is to determine addresses for all data and variables appeared in the source file(s). For instance, when the assembler reads the following code:\n```\n.entry MAIN\nMAIN:   mov LENGTH, r1\n\t    lea STR, r2\nLOOP:   prn @r2\n        inc r2\n        sub #1, r1\n        jnz LOOP\nEND:    hlt\nSTR:    .string \"abcdef\"\nLENGTH: .data 6\n```\n\nit has to replace the operation names mov, lea, jnz, prn, sub, inc, jnc, hlt with their equivalent binary codes, in addition, the assembler has to replace the symbols STR, LEN, MAIN, LOOP, END with their appropriate addresses that have been allocated for the directive statements.\nAssuming that the code in example I has being translated by the assembler and has been stored (operations and directives) in a memory block that starts from address 0000, then this translation can be described as follow:\n\n| Label         | Address | Command         | Operand(s)        | Machine Code |\n| ------------- | ------- | --------------- | ----------------- | ------------ |\n|               |         | ``` .entry ```  |  ``` MAIN ```     |              |\n| ``` MAIN: ``` | 0000    | ``` mov ```     |  ``` LEN, r1 ```  | 0219         |\n|               | 0001    |                 |                   | 0012         |\n|               | 0002    | ``` lea ```     |  ``` STR, r2 ```  | 621a         |\n|               | 0003    |                 |                   | 000b         |\n| ``` LOOP: ``` | 0004    | ``` prn ```     |  ``` @r2 ```      | c022         |\n|               | 0005    | ``` inc ```     |  ``` r2 ```       | 701a         |\n|               | 0006    | ``` sub ```     |  ``` #1, r1 ```   | 3019         |\n|               | 0007    |                 |                   | 0001         |\n|               | 0008    | ``` jnz ```     |  ``` LOOP ```     | 9008         |\n|               | 0009    |                 |                   | 0004         |\n| ``` END: ```  | 000a    | ``` hlt ```     |                   | f000         |\n| ``` STR: ```  | 000b    | ``` .string ``` |  ``` \"abcdef\" ``` | 0061         |\n|               | 000c    |                 |                   | 0062         |\n|               | 000d    |                 |                   | 0063         |\n|               | 000e    |                 |                   | 0064         |\n|               | 000f    |                 |                   | 0065         |\n|               | 0010    |                 |                   | 0066         |\n|               | 0011    |                 |                   | 0000         |\n| ``` LEN: ```  | 0012    | ``` .data ```   |  ``` 6 ```        | 0006         |\n\nIf the assembler maintains a table of all the operation names and their corresponding binary codes, then all operation names can be easily converted. Whenever the assembler reads an operation name it can simply use the table to find its equivalent binary code. In order to carry the same conversion for the addresses of symbols the assembler has to build similar table.\nFor instance, in example I, prior to reading the source file(s) the assembler has no way to know that the LOOP symbol relates to address 0004.\nThus, in regards to all symbols that have been defined by the programmer, the assembler has to accomplish two separate tasks. The first task is to build a table of all symbols and their related numeric values, and the second is to replace all the symbols, appeared in the source file(s) with the numeric values of the address fields. This two assignments can be achieved by performing two separate scans (passes) on the source file(s). In the first pass the assembler builds a table of symbols, this table correspond address to each symbol.\nIn the second pass the assembler translate the source file(s) into binary machine code.\nNotice that the two passes are done by the assembler, during translation (in the assembly time), before the linking process.\nAfter the translation process, the program may be linked and load to memory for execution.\n\n## First pass\nIn the first pass, each instruction is being substituted with its appropriate code and the table of symbols is being built. The rest of the code are left untouched. The code should be loaded at address zero. After applying the first pass on example I, we should get the following result\n\nThe table of symbols:\n\n| Name | Value | Image       |\n| ---- | ----- | ----------- |\n| MAIN | 0000  | instruction |\n| LOOP | 0004  | instruction |\n| END  | 000a  | instruction |\n| STR  | 0000  | data        |\n| LEN  | 0007  | data        |\n\nList of entries:\n\n| Name | Value |\n| ---- | ----- |\n| MAIN | ????  |\n\nData image:\n\n| Address | Value |\n| ------- | ----- |\n| 0000    | 0061  |\n| 0001    | 0062  |\n| 0002    | 0063  |\n| 0003    | 0064  |\n| 0004    | 0065  |\n| 0005    | 0066  |\n| 0006    | 0000  |\n| 0007    | 0006  |\n\nInstruction image:\n\n| Address | Value |\n| ------- | ----- |\n| 0000    | 0219  | \n| 0001    | ????  |\n| 0002    | 621a  |\n| 0003    | ????  |\n| 0004    | c022  |\n| 0005    | 701a  |\n| 0006    | 3019  |\n| 0007    | ????  |\n| 0008    | 9008  |\n| 0009    | ????  |\n| 000a    | f000  |\n\n## Second pass\nApplying the second pass on the code of example I yields the following final results:\n\n| Name | Value | Image       |\n| ---- | ----- | ----------- |\n| MAIN | 0000  | object code |\n| LOOP | 0004  | object code |\n| END  | 000a  | object code |\n| STR  | 000b  | object code |\n| LEN  | 0012  | object code |\n\nList of entries:\n\n| Name | Value |\n| ---- | ----- |\n| MAIN | 0000  |\n\nObject code:\n\n| Address | Machine Word |\n| ------- | ------------ |\n| 0000    | 0219         |\n| 0001    | 0012         |\n| 0002    | 621a         |\n| 0003    | 000b         |\n| 0004    | c022         |\n| 0005    | 701a         |\n| 0006    | 3019         |\n| 0007    | 0001         |\n| 0008    | 9008         |\n| 0009    | 0004         |\n| 000a    | f000         |\n| 000b    | 0061         |  \n| 000c    | 0062         |  \n| 000d    | 0063         |  \n| 000e    | 0064         |  \n| 000f    | 0065         |  \n| 0010    | 0066         |  \n| 0011    | 0000         |  \n| 0012    | 0006         |\n\nWhen the assembler program is done an object code is generated this object code is to be sent to a linker program. The purpose of the linker program is described as follows:\n\n1. To allocate the program with place in memory (allocation).\n2. To link the object file into one executable file (linking)\n3. To change addresses according to the loading place (relocation)\n4. To physically load the code into memory.\n\nAfter the linker program is done the program can be loaded to memory and is ready to run. We are not going to make further discussion on how the linker program works.\n\n# The format of output files\nThe object file written by the assembler provides informations about machine's memory. The first instruction is to be inserted to memory address 0, the second instruction is to be inserted to be inserted to memory address 2,3 or 4 (depending on the length of the first instruction) and so fourth until the translation of the last instruction. The next memory address, after the last translated instruction, contains the data that were built by the '.data' and '.string' instructions, their order of appearance in memory depends on their precedence of appearance in the source file (first instruction occupies first free memory in a rising order).\n\n## The object code file (.oc)\nThe object file is composed out of lines of text and contains 3 sections: code, entries, externals. \n\n### code\nThe code section starts with '.cbegin' and ends with '.cend'.\nThe first line contains (in hex) the length of the code and the length of data, both are in terms of memory words. Those two numbers must be separated by white space. Each of the next lines provides information on the content of memory address (in hex form) starting from memory address 0. In addition, for each memory address, occupied by instruction (not data), there appear additional information for the linker. This additional information could be one of the following three characters: 'e' 'a' or 'r'. The character 'a' designates the fact that the content of the memory address is absolute and does not depend on where the file is to be loaded (the assembler assumes it to start from memory address 0). The character 'r' designates the fact that memory address is relocatable and should be added with the appropriate offset, in regards to where the file is to be loaded. The offset is the first memory address from which the first instruction of the program is to be loaded. The letter 'd' designates the fact that the content of the file depends on external variable, the linker program is to take care on the insertion of the appropriate value.\n\n### entries\nThe entries section starts with '.lbegin' and ends with '.lend'.\nThe entries section is composed out of lines of text. Each line contains the entry name and value, as it was computed for this file.\n\n### externals\nThe entries section starts with '.ebegin' and ends with '.eend'.\nThe externals section is composed out of lines of text. Each line contains the name and memory address of the external variable.\n\n## Binary file (.bin)\nThe binary file contains the object code in binary (non-text) format. It can't be created, if the source code contains .extern directives.\n\n## Example files\n### test\nPrints the string \"abcdef\".\n\n*test.as*\n```\n; test.as\n; Prints the string \"abcdef\".\n\n        .entry MAIN      ; file contains the definition of MAIN\nMAIN:   mov LEN, r1\t     ; move LEN(=6) to r1\n        lea STR, r2\t     ; load the address of STR to r2\nLOOP:   prn @r2          ; print the character at the memory location that r2 holds\n        inc r2           ; r2 = r2 + 1\n        sub #1, r1       ; r1 = r1 - 1\n        jnz LOOP         ; jump to LOOP if the zero flag is not set (sub sets it)\nEND:    hlt              ; end of the program\nSTR:    .string \"abcdef\" ; string to print\nLEN:    .data 6          ; length of the string\n```\n*test.oc*\n```\n.cbegin\nb 8\n0000 0219 a\n0001 0012 r\n0002 621a a\n0003 000b r\n0004 c022 a\n0005 701a a\n0006 3019 a\n0007 0001 a\n0008 9008 a\n0009 0004 r\n000a f000 a\n000b 0061  \n000c 0062  \n000d 0063  \n000e 0064  \n000f 0065  \n0010 0066  \n0011 0000  \n0012 0006  \n.cend\n.lbegin\nMAIN 0000\n.lend\n.ebegin\n.eend\n```\n\n# Usage of tas\n\n```\ntas \u003coptions\u003e source-file\n```\nwhere the options are:\n```\n-l : prints debugging lists after each pass\n-n : creates NO output files\n-b : creates binary output file\n-h : shows this text\n```\n\n# Compilation of tas\n\n*Windows*\n```\ncd tas\nmkdir build\ncd build\ncmake ..\ntas.sln\n```\n*Linux*\n```\ncd tas\nmkdir build\ncd build\ncmake ..\nmake\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fg0mb4%2Ftas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fg0mb4%2Ftas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fg0mb4%2Ftas/lists"}