{"id":13629181,"url":"https://github.com/sysprog21/jitboy","last_synced_at":"2026-03-17T18:04:36.676Z","repository":{"id":44935057,"uuid":"315331978","full_name":"sysprog21/jitboy","owner":"sysprog21","description":"A Game Boy emulator with dynamic recompilation (JIT)","archived":false,"fork":false,"pushed_at":"2023-05-29T21:55:07.000Z","size":234,"stargazers_count":312,"open_issues_count":2,"forks_count":14,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-05-08T23:53:51.161Z","etag":null,"topics":["dynamic-compiler","dynasm","emulator","game-boy","gameboy","gameboy-emulator","gbz80","jit","jit-compiler","sdl2"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sysprog21.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-11-23T13:57:13.000Z","updated_at":"2025-05-08T03:26:54.000Z","dependencies_parsed_at":"2024-01-14T06:25:02.649Z","dependency_job_id":"8d649ee8-40e8-41aa-929a-954bb2600a31","html_url":"https://github.com/sysprog21/jitboy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sysprog21%2Fjitboy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sysprog21%2Fjitboy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sysprog21%2Fjitboy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sysprog21%2Fjitboy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sysprog21","download_url":"https://codeload.github.com/sysprog21/jitboy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253166486,"owners_count":21864471,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dynamic-compiler","dynasm","emulator","game-boy","gameboy","gameboy-emulator","gbz80","jit","jit-compiler","sdl2"],"created_at":"2024-08-01T22:01:03.950Z","updated_at":"2026-03-17T18:04:31.643Z","avatar_url":"https://github.com/sysprog21.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"# jitboy\n\nA Game Boy emulator with dynamic recompilation (JIT) for x86-64.\n\n## Overview\n\nSince most of the games published for the Game Boy are only available in\nbinary form as ROM images, porting to current systems is excluded.\nAn alternative is the emulation of the Game Boy architecture: a runtime\nenvironment that is as exactly the same as the Game Boy and is able to\nexecute the unmodified program is provided.  Due to the incompatible\nprocessor architecture, the instruction sequence of the emulated programms\ncannot be executed directly: either they are interpreted instruction by\ninstruction, i.e. the fetch execute cycle is carried out in software, or\nit is translated into compatible instructions. The second method - often\n\"dynarec\" or called just-in-time (JIT) compiler - is used in many emulators\nbecause of its potentially higher speed.\n\nThe instructions are translated mostly dynamically at runtime, since\nstatic analysis is difficult - e.g. by tracking all possible execution\npaths from a known entry jump point. Self-modifying code and jumping to\naddresses calculated at runtime often make a fallback to interpretation or\ndynamic translation at runtime necessary in the case of static translation.\n\nThe emulator `jitboy` carries out a dynamic translation of the processor\ninstructions. All other interfaces (graphics, sound, memory) are additionally\nemulated by interpreting the address space.\n\n## Game Boy Specification\n\n| Component    | Detail                                                 |\n|------------- |--------------------------------------------------------|\n| CPU          | 8-bit (Similar to the Z80 processor)                   |\n| Clock Speed  | 4.194304MHz (4.295454MHz for SGB, max. 8.4MHz for CGB) |\n| Work RAM     | 8K Byte (32K Byte for CGB)                             |\n| Video RAM    | 8K Byte (16K Byte for CGB)                             |\n| Screen Size  | 2.6\"                                                   |\n| Resolution   | 160x144 (20x18 tiles)                                  |\n| Max sprites  | Max 40 per screen, 10 per line                         |\n| Sprite sizes | 8x8 or 8x16                                            |\n| Palettes     | 1x4 BG, 2x3 OBJ (for CGB: 8x4 BG, 8x3 OBJ)             |\n| Colors       | 4 grayshades (32768 colors for CGB)                    |\n| Horiz Sync   | 9198 KHz (9420 KHz for SGB)                            |\n| Vert Sync    | 59.73 Hz (61.17 Hz for SGB)                            |\n| Sound        | 4 channels with stereo sound                           |\n| Power        | DC6V 0.7W (DC3V 0.7W for GB Pocket, DC3V 0.6W for CGB) |\n\n\n## CPU\n\nThe main processor of Game Boy is a Sharp LR35902, a mix between the Z80 and\nthe Intel 8080 that runs at 4.19 MHz. It is usually called as \"GBZ80\", however,\nit is not a Z80 compatible processor, nor a 8080 compatible processor.\n\u003e CPU model is LR35902, and its core is SM83.\n\nThe Z80 is an 8-bit microprocessor, meaning that each operation is natively\nperformed on a single byte. The instruction set does have some 16-bit\noperations but these are just executed as multiple cycles of 8-bit logic.\nThe Z80 has a 16-bit wide address bus, which logically represents a 64K memory\nmap. Data is transferred to the CPU over an 8-bit wide data bus but this is\nirrelevant to simulating the system at state machine level. The Z80 and the\nIntel 8080 that it derives from have 256 I/O ports for accessing external\nperipherals but the Game Boy CPU has none - favouring memory mapped I/O (MMIO)\ninstead.\n\n| Type           | CPU Speed | NOP Instruction |\n|----------------|-----------|-----------------|\n| Machine Cycles | 1.05MHz   | 1 cycle         |\n| Clock Cycles   | 4.19MHz   | 4 cycles        |\n\nNotice, 1 Machine Cycle = 4 clock cycles.\n\n### Registers\n\nThe Intel 8080 and Game Boy CPU have six 8-bit general purpose registers, an\naccumulator, flags, stack pointer and program counter. 16-bit access is also\nprovided to each general purpose register and the accumulator and flags registers\nin sequential pairs. Additionally, the Z80 has two more 16-bit index registers,\nan alternative set of each general purpose, accumulator and flags registers and\na few more bits and pieces.\n\nThe Game Boy CPU has one bank of general purpose 8-bit registers: `A`, `B`, `C`,\n`D`, `E`, `F`, `H` and `L`.\n\nWhile the CPU only has 8 bit registers, there are instructions that allow the\ngame to read and write 16 bits (i.e. 2 bytes) at the same time. These registers\nare refered to as `AF` (\"a\" and \"f\" combined), `BC` (\"b\" and \"c\" combined),\n`DE` (\"d\" and \"e\" combinded), and finally `hl` (\"h\" and \"l\" combined).\n\n| Register | Size                | Purpose                           |\n|----------|---------------------|-----------------------------------|\n| AF       | 16-bit or two 8-bit | Accumulator (A) and flag bits (F) |\n| BC       | 16-bit or two 8-bit | Data/address                      |\n| DE       | 16-bit or two 8-bit | Data/address                      |\n| HL       | 16-bit or two 8-bit | Accumulator/address               |\n| SP       | 16-bit              | Stack pointer                     |\n| PC       | 16-bit              | Program counter                   |\n\nThe Z80 defines alternative/banked versions of `AF`, `BC`, `DE` and `HL` that are\naccessed via the exchange opcodes and also has some more specialized registers.\n\n| Register | Size                | Purpose                        |\n|----------|---------------------|--------------------------------|\n| IX       | 16-bit or two 8-bit | Displacement offset base       |\n| IY       | 16-bit or two 8-bit | Displacement offset base       |\n| I        | 8-bit               | Interrupt vector base register |\n| R        | 8-bit               | DRAM refresh counter           |\n\nThe flags register is a single byte that contains a bit-mask set according to\nthe last result. Notice that the Game Boy flags register only uses the most\nsignificant 4-bits and does not implement the sign or parity/overflow flag.\nThe least significant bits of the Game Boy flags register are always 0.\n\n| 8080/Z80 Bit | Game Boy Bit | Name            |\n|--------------|--------------|-----------------|\n| 0            | 4            | C: Carry        |\n| 1            | 6            | N: Subtract     |\n| 2            | -            | Parity/Overflow |\n| 3            | -            | Undocumented    |\n| 4            | 5            | H: Half Carry   |\n| 5            | -            | Undocumented    |\n| 6            | 7            | Z: Zero         |\n| 7            | -            | Sign            |\n\n### Core\n\nA CPU runs on a fetch-decode-execute cycle, called the machine cycle or m-cycle.\nThe CPU will initially fetch a byte, whose location in the address space is pointed\nto by the program counter register (PC), decode it as an instruction (opcode) and\nexecute it, or contextually use it as a literal for a previous cycle. Opcodes not\nrelated to absolute program flow, such as jumps or calls, will end a cycle by\nincrementing the program counter to point at the next byte in the address space.\nOpcode length is variable and whilst some operations run in a single cycle, others\nrequire multiple fetch-decode-execute cycles to run.\nHere is an example of running three simple opcodes on a Z80:\n\n![Example fetch-decode-execute on the Z80](/assets/cpu.svg)\n\nWe are not really concerned with this low level cycle as software cannot control\nit, but we do need to keep track of how many have occurred so that we have a mechanism\nto match (read: approximate) platform timing. Our higher level cycle will be based on\na concept of an operation, which can be represented by one or more opcodes and optional\nliterals.\n\nEach operation cycle will:\n1. Fetch the next opcode.\n2. Decode the fetched opcode.\n3. Fetch any extra data required to resolve the operation including extra opcodes\n   and literals.\n4. Record all m-cycles consumed in the operation so that we can block later to\n   adjust our timings.\n5. Execute the opcode.\n\n### Instructions\n\nInstruction length can be 1 to 4 bytes long depending on the specific instruction.\nOpcodes can be seen as 9 bits long, and will be encoded into 1 or 2 bytes. If the\nfirst byte is `0xCB`, then the second byte would be one of the high 256 opcodes,\notherwise, the first byte is one of the low 256 opcodes.\n\nFor example, if the first byte is `0x43`, then the opcode of this instruction is\n`0x043`; if the first byte is `0xCB` and the next byte is the `0x43`, then the\nopcode of this instruction is `0x143`.\n\nAfter the opcode, there can be a optional immediate, 8-bit or 16-bit long, gives\nthe total length of 1 to 4 bytes.\n\n### Execution Timing\n\nThe processor runs at either 4 MiHz (4194304 Hz = 2^12 Hz) or 8 MiHz (Double Speed\nMode on GBC). The instruction execution time is always dividable by 4, ranging from\n4 cycles to 20 cycles. Ususally a clock cycle at 4 MiHz is called a T-cycle.\n4 T-cycles combined together is called a M-cycle (1 MiHz). So, one instruction could\ntake 1 to 5 M-cycles to execute.\n\nThe processor can do one memory read or memory write in one M-cycle, since the\ninstruction itself needed be fetched, the execution speed can never be faster than\nthe speed it can read the instruction. For example, a 3 byte instruction needs at\nleast 3 M-cycles (12 T-cycles) to execute. If the instruction involves memory read\nor write, the processor would have to spend more M-cycles just to access the memory.\n\nThe processor is also only capable of doing 1 8-bit ALU operation each M-cycle,\nif the instruction need to do 16-bit ALU operation, additional 1 M-cycle may be\nneeded to complete the operation.\n\nThe processor also has a prefetch queue with the length of 1 byte.\n\n## Interrupts\n\nThe Game Boy provides a total of five different interrupts:\n* `VBLANK`\n  - The `VBLANK` interrupt is displayed after each image displayed and marks the\n    beginning of the VBLANK phase in which the video memory can be freely accessed\n    for 4560 clock cycles.\n* `STAT`\n  - The `STAT` register (memory address `0xFF41`) changes between three states with\n    each displayed image line and to a fourth during the `VBLANK` phase. The `STAT`\n    interrupt can be triggered when these states change. Which state transitions are\n    affected can be selected.\n* `Timer`\n  - The timer interrupt is triggered when the timer register (`0xFF05`) overflows.\n    The rate at which the timer register is incremented can be selected so that the\n    timer interrupt occurs at a selectable rate of 16Hz, 64Hz, 256Hz or 1kHz.\n* Serial\n  - The serial transfer interrupt is triggered when a serial transfer is completed.\n* Joypad\n  - Every time one of the eight buttons is pressed, the joypad interrupt is triggered.\n\nIf an interrupt occurs, it becomes pending and a bit is set in the interrupt flag\nregister (`0xFF0F`). The interrupt enable register (`0xFFFF`) can be used to select\nwhich interrupts are active. The interrupt master enable flag can also all turn off\ninterrupts. It can be manipulated with the instructions `DI` (Disable Interrupts),\n`EI` (Enable Interrupts) or `RETI` (Return from Interrupt).\n\nIf an interrupt is pending, the corresponding bit in the Interrupt Enable Register\nand the Interrupt Master Enable flag are set, a handler function with a fixed start\naddress between `0x40` (`VBLANK`) and `0x60` (Joypad) is called and further interrupts\nare prevented during the treatment using the Interrupt Master Enable .\n\n## Memory and Memory Mapped I/O Devices\n\nThe relationship between CPU, memory management unit (MMU), memory and memory\nmapped I/O (MMIO) devices looks something like the following.\n\n![Simple diagram of the Game Boy MMU](/assets/mmu.svg)\n\nAn MMU should support reading and writing data in various lengths across the\nentire address space, whilst abstracting away the hardware that is physically\nattached to each location in the space.\n\nWe can implement an MMU in a platform agnostic way by introducing a concept of\nsegments. A segment has a location and length so that the MMU can correctly position\nit in address space and will provide implementation specific data access operations.\nFor example, most Game Boy cartridges have a microcontroller acting as a memory bank\ncontroller (MBC) over multiple banks of read only memory (ROM). Read requests for data\nin an MBC address space will be forwarded to a configured page of ROM, whereas write\nrequests will change which page is configured. For this reason we really need\ndifferent interfaces for readable and writeable segments.\n\n### Memory Map\n\n16-bit addressing to ROM, RAM, and I/O registers.\n\n| Address   | Usage                                                        |\n|-----------|--------------------------------------------------------------|\n| 0000-3FFF | 16KB ROM Bank 00 (in cartridge, fixed at bank 00)            |\n| 4000-7FFF | 16KB ROM Bank 01..NN (in cartridge, switchable bank number)  |\n| 8000-9FFF | 8KB Video RAM (VRAM) (switchable bank 0-1 in CGB Mode)       |\n| A000-BFFF | 8KB External RAM (in cartridge, switchable bank, if any)     |\n| C000-CFFF | 4KB Work RAM Bank 0 (WRAM)                                   |\n| D000-DFFF | 4KB Work RAM Bank 1 (WRAM) (switchable bank 1-7 in CGB Mode) |\n| E000-FDFF | Same as C000-DDFF (ECHO) (typically not used)                |\n| FE00-FE9F | Sprite Attribute Table (OAM)                                 |\n| FEA0-FEFF | Not Usable                                                   |\n| FF00-FF7F | I/O Ports                                                    |\n| FF80-FFFE | High RAM (HRAM)                                              |\n| FFFF      | Interrupt Enable Register                                    |\n\nThe addresses between `0x8000` and `0x9FFF` form the video RAM. It contains 8 × 8 pixel\ntiles of 16 bytes each, as well as foreground and background tile maps.\n\nThe cartridge RAM is displayed between `0xA000` and `0xBFFF`. Depending on the MBC,\nseveral banks can be swapped. In some game cartridges, this memory is supplied by\na battery and can therefore hold a game status even when the Game Boy is switched off.\n\nThis is followed by 8kB of internal RAM (`0xC000` - `0xDFFF`), which is almost\ncompletely mirrored a second time in the address range `0xE000` - `0xFDFF`. However,\nthese addresses are typically not used. The addresses `0xFE00` to `0xFE9F` contain the\nOAM memory. It contains the position, the graphic to be displayed, the grayscale\npalette used and the flags of all 40 sprites. The OAM memory can be simultaneously\noverwritten via DMA transfer.\n\nThe hardware IO is controlled via the address range `0xFF00` to `0xFF7F`. It contains\nregisters for controlling timers, serial transfers, DMA transfers, sound output and\nthe map area to be displayed. This is followed by a further 127 bytes of main memory\n(`0xFF80` - `0xFFFE`), which can be read and written at any time. Since all other\nmemory can neither be read nor written during a DMA transfer, a jump must be made to\nthis memory area during such a transfer.\n\nThe interrupt enable register occupies the highest address 0xFFFF.\n\n### Jump Vectors in First ROM Bank\n\nThe following addresses are supposed to be used as jump vectors:\n* 0000,0008,0010,0018,0020,0028,0030,0038 for RST commands\n* 0040,0048,0050,0058,0060 for Interrupts\n\nHowever, the memory may be used for any other purpose in case that your program\ndoes not use any (or only some) `RST` commands or Interrupts.\nRST commands are 1-byte opcodes that work similiar to `CALL` opcodes, except that\nthe destination address is fixed.\n\n### Internal RAM Echo\n\nThe addresses E000-FE00 appear to access the internal RAM the same as C000-DE00.\n(i.e. If you write a byte to address E000 it will appear at C000 and E000. Similarly,\nwriting a byte to C000 will appear at C000 and E000.)\n\n### Cartridge Header in First ROM Bank\n\nThe memory at 0100-014F contains the cartridge header.\nThis area contains information about the program, its entry point, checksums,\ninformation about the used MBC chip, the ROM and RAM sizes, etc. Most of the\nbytes in this area are required to be specified correctly.\n\n### External Memory and Hardware\n\nThe areas from 0000-7FFF and A000-BFFF may be used to connect external hardware.\nThe first area is typically used to address ROM (read only, of course), cartridges\nwith Memory Bank Controllers (MBCs) are additionally using this area to output data\n(write only) to the MBC chip.\n\nThe second area is often used to address external RAM, or to address other external\nhardware (Real Time Clock, etc). External memory is often battery buffered, and may\nhold saved game positions and high scrore tables (etc.) even when the Game Boy is\nturned of, or when the cartridge is removed.\n\n## JIT Compilation\n\nFor the emulation of the Game Boy hardware on conventional PCs (x86-64\narchitecture) a JIT compiling emulation core was implemented.\nInstead of decoding and interpreting individual instructions in a loop, as\nwith an interpreting emulator, an attempt is made to combine entire blocks\nthat usually end with a jump instruction (JP, JR, CALL, RST, RET, RETI).\nBy means of the [DynASM](https://luajit.org/dynasm.html) runtime assembler of\nthe [LuaJIT](https://luajit.org/) project, x86 instructions corresponding to\nthe block are generated and executed at the first point in time at which\na memory address is jumped to.\n\nOne goal during development was to use the status flags (carry, half-carry /\nadjust and zero) of the host architecture for the emulated environment instead\nof emulating it. In most cases this is possible without any problems, since\nthe Z80-like Game Boy CPU [LR35902](https://pastraiser.com/cpu/gameboy/gameboy_opcodes.html)\nand the Intel 8080 architecture, which is largely also supported by modern\nprocessors, are very similar. Since the subtract flag of the Game Boy has no\ndirect equivalent in the x86-64 architecture, it is the only one of the status\nflags that has to be emulated.\n\nJumps are not executed directly, but instead the jump target is saved and\nthe generated function is exited with `RET`. This allows the runtime environment\nto first compile the block at the jump target and perform other parallel tasks,\nincluding interrupt, graphics, input and DMA emulation.\n\nDuring the compilation of a program block, the number of Game Boy clock cycles\nrequired up to this point is calculated for each possible end over which the\nblock can be exited, and this sum is added to an instruction counter during\nexecution. By means of this counter, events that occur on the Game Boy at\ncertain times, such as timer or `VBLANK` interrupts, can be precisely timed\ndespite the higher speed of the host platform. Since there may be routines in\nthe emulated programs that are dependent on a fixed number of executed\ninstructions in a certain period of time, the timers of the host system cannot\nbe used without compatibility problems. Due to the block-wise execution, however,\nthere is also the problem with the emulator presented here that interrupts or\ntimers are only executed or updated a few clock pulses late - after the next jump.\n\nDuring the execution of compiled program blocks, the register set of the Game Boy\nis mapped directly to registers of the x86-64 architecture. At the end of a block,\nthe entire Game Boy register set, processor flags and the number of emulated clock\ncycles must be saved (struct `gb_state`). The following table shows the register\nusage during the execution of translated blocks. The combined registers `AF`, `BC`,\n`DE` and `HL` required for the 16-bit instructions of the Game Boy are first put\ntogether in a temporary register and written back after the instruction.\n\n| Game Boy | x86-64   | comment |\n|----------|----------|---------|\n| A        | r0 (rax) | accumulator |\n| F        | -        | generated dynamically from the `FLAGS` register |\n| B        | r1 (rcx) | |\n| C        | r2 (rdx) | |\n| D        | r3 (rbx) | |\n| E        | r13      | |\n| H        | r5 (rbp) | |\n| L        | r6 (rsi) | |\n| SP       | r7 (rdi) | |\n| PC       | -        | not necessary |\n| -        | r8       | base address of Game Boy address space |\n| -        | r9       | address of strct `gb_state` |\n| -        | r10      | temporary register |\n| -        | r11      | temporary register |\n| -        | r12      | temporary register |\n| -        | r4 (rsp) | host stack pointer |\n\nA second important goal of the implementation was the support of direct read\nmemory access: to read an address of the Game Boy address space, only one\nadditional addition of a base pointer should be necessary. This is not possible\nfor write memory accesses, since write accesses to addresses in the ROM lead to\nbank changes by the MBC and some IO registers trigger certain actions such as\nDMA transfers or reading the joypad buttons during write accesses. Write access\nis therefore replaced by a function call that emulates any necessary side effects.\n\nDirect read access has some important implications:\n* Hardly any reading overhead: Compared to the Game Boy, there is hardly any reading\n  overhead with the emulation. Since reading memory accesses are often among the most\n  frequent instructions, this means a significant increase in efficiency.\n* The emulated Game Boy address space must be consecutive: the change of ROM or RAM\n  banks requires a lot of additional effort, as the corresponding bank must first\n  be mapped into the address space by munmap and mmap.\n* Status registers must always be updated: the program sequence must be interrupted\n  frequently in order to update special status registers such as the TIMA timer\n  (`0xFF05`) or the currently drawn image line LY (`0xFF44`). If this does not\n  happen, queues may no longer terminate.\n\n### Exemplary translation of a block\n\nThe individual steps for translating and executing a program block should be\nillustrated by an example: The following listing shows a block from the game\n\"Super Mario World\".\n```\n3E 02\t\tLD A, 2\nEA 00 20\tLD (0x2000), A\nE0 FD\t\tLDH (0xFD), A\nFA 1D DA\tLD A, (0xDA1D)\nFE 03\t\tCP A, 3\n20 0B\t\tJR NZ, 0xOB\n3E FF\t\tLD A, 0xFF\nEA 1D DA\tLD (0xDA1D), A\nCD E8 09\tCALL 0x9E8\n```\n\nIn the first step, instructions are read to the end of the block. Every unconditional\njump (`JP`, `CALL`, `RST`, `RET`, `RETI`), as well as `EI` (Enable Interrupts)\nterminate a block. The instructions are stored in a linked list and grouped according\nto their type. Various rules for optimization are applied to this instruction list,\nand instructions for saving and restoring the status register are inserted. Then the\nappropriate x86-64 assembler is generated - the example is translated to the following\ncode (without optimization):\n```\n    prologue\n    mov A, 2\n    write_byte 0x2000, A\n    write_byte 0xfffd, A\n    mov A, [aMem + 0xda1d]\n    cmp A, 3\n    save_cc\n    restore_cc\n    jz \u003e1\n    add qword state-\u003einst_count, 17\n    return 0x239\n1:  mov A, 0xff\n    write_byte 0xda1d, A\n    dec SP\n    dec SP\n    and SP, 0xffff\n    mov word [aMem + SP], 0x235\n    add qword state-\u003einst_count, 28\n    mov byte state-\u003ereturn_reason, REASON_CALL\n   return 0x9e8\n```\n\nSome macros are used for simplification:\n* `prologue` saves all necessary registers and restores the Game Boy register contents.\n* `return` saves all register contents in the `gb_state` struct, restores the original\n  register contents, writes the argument in the result register and exits the function\n  with RET.\n* `write_byte` calls the function `gb_memory_write`.\n* `save_cc` saves the status register on the stack.\n* `restore_cc` restores the status register from the stack.\n\n`aMem` designates the register `r8`, which contains the base address of the Game Boy\naddress space, state the register `r9`, which contains the address of the `gb_state`.\n`state-\u003einst_count` counts executed Game Boy clock cycles, `state-\u003etrap_reason`\nspecifies which instruction terminates the block in order to update the backtrace in\nthe debugger. However, the debugger is not yet implemented.\n\nIn the next step, `DynASM` is used to assemble the code and convert it into a\nto write a pre-allocated memory area. After this can be carried out using mprotect the\nfunction can be executed. To run again accelerate, the function pointer is stored\nindexed via the start address. The memory address returned by the function is the\nstart address of the next block to be executed.\n\u003e If an interrupt occurs, it is instead placed on the Game Boy stack and the start\n\u003e address of the interrupt handler is jumped to.\n\nIf a memory address within the RAM is jumped to, it must be assumed that the sequence\nof instructions has changed during the next execution. Blocks within the RAM are\ntherefore discarded again after execution. Blocks within the addresses `0xFF80` to\n`0xFFFE` are an exception: a jump into this area must be made briefly during DMA\ntransfers. The routine that waits for the transfer to end does not usually change\nduring the execution of the program. It is therefore worthwhile to temporarily store\nthe blocks until there is a write access to this memory area.\n\n### Optimization\n\nAfter reading an instruction block, some rules for optimization are applied. Loops\ninterrupt the program flow very frequently with a large number of jumps and thus\ncause a very high overhead for saving and restoring the register contents and for\nchecking for interrupts.  For this reason, most of the implemented optimizations\nare for the detection and handling of loops.\n\nThe easiest way to recognize loops is by jumping to the start address of the\ncurrent block, since a new block is translated from this start address after\nthe first iteration and the return to the beginning of the loop.\n\nIf there is no read or write memory access in the loop body and all interrupts\nreturn with RET or RETI, the entire loop can be executed atomically. In this case\nit is irrelevant whether an interrupt is executed before, during or after the loop.\nSimple waiting loops that wait a fixed number of clock cycles can be accelerated\nin this way: the return to the beginning of the block is carried out directly\nwithout relinquishing control to the runtime environment and checking for\ninterrupts in the meantime.\n\nWriting memory accesses can also usually be carried out safely. In this case,\nhowever, an interrupt handler can possibly be influenced by the loop and behave\nincorrectly due to the additionally executed loop iterations. Reading memory\naccess, on the other hand, carries a considerable risk: a waiting loop waiting\nfor an interrupt or timer may no longer terminate. Since read access to timer\nand status registers is usually carried out with special instructions\n(`LDH A, (a8)` or `LDH A, (C)`), other read instructions are allowed in loops\nin higher optimization levels.\n\nThe following loop executes a `memset` on a memory area of length `BC` with\nend address `HL` and can be executed without interruption with the above\noptimizations:\n```\n32\t\tLD (HL-), A\t; Set byte and decrement HL\n05\t\tDEC B\n20 FC\t\tJR NZ, 0xFC\t; jump to the beginning\n0D\t\tDEC C\n20 F9\t\tJR NZ, 0xF9\t; jump to the beginning\n```\n\nOther optimizations use pattern matching to search for known and frequent\ninstruction sequences that can be simplified. The following frequently used\npattern waits until a specific line of the display is drawn.\n\u003e A `VBLANK` can also be waited for if the line is \u003e 144.\n\n```\nF0 44\t\tLDH A, (0x44)\t; read current display line\nFE ??\t\tCP A, ??\t; compare with a fixed value\n20 FA\t\tJR NZ, 0xFA\t; jump to the beginning\n```\n\nInstead, a modified HALT instruction can be inserted in the emulation, which\nwaits for the corresponding display line to be drawn instead of an interrupt.\n\n## Graphics\n\nThe pixels of the Game Boy display cannot be addressed individually, rather\nwhole tiles of 8 × 8 pixels each are displayed. In addition to a foreground\nand background map (called `WIN` and `BG`) that contain the indices of the tiles\nto be displayed, up to 40 sprites can be freely positioned on the display.\n\nThe image is built up line by line from top to bottom. The line currently being\nprocessed can be read out via register `LY` (0xFF44) and via the `STAT` Register\n(`0xFF41`) whether access to the graphics memory is currently possible.\n\nThe size of the foreground and background map is 32 by 32 tiles, so that only\na section is visible on the display. Via the register `SCX` (Scroll X, `0xFF43`),\n`SCY` (Scroll Y, `0xFF42`), `WX` (Window X, `0xFF4B`) and `WY` (Window Y, `0xFF4A`),\nthe area to be displayed can be selected. By changing the visible area while the\nimage is being built up, wave effects can be created on the display.\n\n![Determination of the brightness value of a background pixel](/assets/background-tiles.svg)\n\nThe above figure shows an example of how the color of a background pixel comes about:\nFirst of all, the tile indices for the currently drawn image line are determined from\nthe Background Tile Map; this can be selected from either `0x9800` or `0x9C00`.\nThe tile data table is indexed from `0x8000` or `0x8800` via this index. The brightness\nvalue of the x-th pixel of the y-th tile line can then be built up from the x-th bit of\nthe 2 * y-th and 2 * y + 1-th bytes. The structure for a foreground pixel is analogous.\nWhen displaying sprites, the OAM memory is used instead of a tilemap: It contains\na 4-byte structure for each of the 40 sprites, which contains the tile index and some\nflags in addition to the screen position. These flags can be used to mirror the sprite,\ndisplay it behind the background or with a different grayscale palette.\n\nSince the graphics output of the Game Boy takes place via special control registers\nas well as defined memory areas for tilemaps, the emulator must interpret these\nmemory areas and generate the corresponding image pixel by pixel. It is not enough\nto interpret the memory once at the beginning of the `VBLANK` period and to output\nthe image, since many games use the display timing to create graphic effects.\nIf these are to be displayed correctly, the image must also be generated line by line\nin the emulator.\n\nAfter each executed instruction block, the `LY` (approx. Every 450 clock cycles) and\nthe `STAT` register (approx. Every 80, 180, 190 clock cycles) are updated in the course\nof interrupt handling. If the `LY` register is incremented, the next image line can be\ndrawn. After 144 lines have been drawn, at the beginning of the `VBLANK` period, the\ngenerated image can finally be passed on to the rendering thread for display. A separate\nrendering thread relieves the main thread of slow updating of the image texture and\nits display and halves the runtime of the main thread per frame.\n\u003e With each processed line, the `STAT` register runs through three modes of different duration.\n\nThe start of the `VBLANK` period is also used to limit the speed: If less than 1/60 s has\npassed since the last `VBLANK`, there is a correspondingly long wait before the execution\nis continued.\n\n## State saving\n\nSome Game Boy cartridges include RAM inside. When inserting a cartridge with RAM, it will get \nmapped at `0xA000`-`0xBFFF` in Game Boy Memory Management Unit. The RAM in the cartridge is \nstored in battery-backed memory, allowing to save game state like high score tables or \ncharacter's position. So even if the Game Boy is turned off, we can still return to the state \nwhen opening it next time.\n\nWhen opening jitboy, the emulator will try to find a file containing the suffix `sav` to the \nend of ROM name. If it exists, every byte in the file will be copied to the RAM banks. Since \none of these RAM banks would be chosen to mapped at `0xA000` to `0xBFFF` by MBC, the saving \nstate can be restored. When closing jitboy, contents in RAM banks should be copied to the file \nwith the name we mentioned before.\n\n## Instruction compliance tests\nAn instruction tester is used to validate the correctness of jitboy's instruction implementation.\nHowever, some limitations on jitboy cause problems to integrate with it. So we need to apply \nsome approaches to get over them.\n\n* Due to the JIT compilation, not each instruction is executed at one time. But this may \nviolate the expectation of instructions tester. If we want to do this on jitboy, we can't \nalways get the correct program counter on the JIT codes' return. Since jitboy cares about \nthe address after JMP/CALL type instructions only, we let the tester check this. Otherwise \nwe'll introduce an indirect layer for memory/register manipulation, using the program counter \nthat doesn't directly come from jitboy to pass the tester.\n* When executing the callback function `gbz80_set_state`, we'll adjust the contents in \n`vm-\u003ememory.mem` to meet what the tester wants. So we can pass the tester correctly.\n* The tester will expect function `gbz80_mmu_write` be executed when we change the\ncontent in memory. In our integration, `gbz80_mmu_write` will be called when executing \n`gb_memory_write`. But jitboy may sometimes write memory by JIT codes and doesn't go \nthrough `gb_memory_write`. So we reuse the macro `write_byte` that call `gb_memory_write` to \nwrite the same value in the same position again.\n* For instructions that write two bytes, another macro `ld16` will be used for `gbz80_mmu_write` \ncalling.\n* To do code generating for flag setting and flag reloading which are required for the \ninstruction tester. Opcodes `0xfc` and `0xfd` which are invalid opcodes in original Game \nBoy are diverted for the purpose. By mapping these opcodes to specific JIT codes, we can \nset the state and also retrieve it back as the tester wishes.\n\n## Build\n\n`jitboy` relies on some 3rd party packages to be fully usable and to\nprovide you full access to all of its features. You need to have a working\n[SDL2 library](https://www.libsdl.org/) on your target system.\n* macOS: `brew install sdl2`\n* Ubuntu Linux / Debian: `sudo apt install libsdl2-dev`\n\nBuild the emulator.\n```shell                                                                                                                                                      \nmake\n```\n\n`build/jitboy` is the built executable file, and you can use it to load Game Boy\nROM files.\n\nRuntime options:\n* `-O` specifies the optimization levels. Typically, you can use `-O 3`\n\nTo enable extra debugging information, you can rebuild the emulator.\n```shell\nmake clean debug\n```\n\nThen, the verbose messages will be dumped when `jitboy` loads and runs the given ROM file.\nMeanwhile, the files whose name starts with `/tmp/jitcode` will be generated along with JIT\ncompilation. You can disassemble them by the command.\n```shell\nobjdump -D -b binary -mi386 -Mx86-64 /tmp/jitcode?\n```\n\nTo run instruction tester.\n```\nmake check\n```\n## Key Controls\n\n| Action            | Keyboard   |\n|-------------------|------------|\n| A                 | z          |\n| B                 | x          |\n| Start             | Return     |\n| Select            | Backspace  |\n| D-Pad             | Arrow Keys |\n| Power off         | ESC        |\n| Fullscreen        | Alt-Return |\n\n## Known Issues\n\n* Only works for GNU/Linux due to ABI compatibility\n\n\n## Reference\n\n* [DMG-01: How to Emulate a Game Boy](https://blog.ryanlevick.com/DMG-01/public/book/)\n* [Emulation of Nintendo Game Boy (DMG-01)](https://raw.githubusercontent.com/Baekalfen/PyBoy/master/PyBoy.pdf)\n\n\n## License\n\n`jitboy` is licensed under the GPLv3.\n\nCopyright (C) 2020-2021 National Cheng Kung University, Taiwan.\nOriginally written by Thomas Witte.\n\nExternal source code:\n* The emulation of the audio processing unit (APU) was based on [MiniGBS](https://github.com/baines/MiniGBS), written by Alex Baines. MIT License.\n* [GBIT](https://github.com/koenk/gbit) (Game Boy Instruction Tester), written by Koen Koning. MIT License.\n* DynASM is part of [LuaJIT](https://github.com/LuaJIT/LuaJIT), written by Mike Pall. MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsysprog21%2Fjitboy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsysprog21%2Fjitboy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsysprog21%2Fjitboy/lists"}