{"id":31579127,"url":"https://github.com/antwjadam/z80nextlibrary","last_synced_at":"2026-02-15T12:34:46.544Z","repository":{"id":310062037,"uuid":"1038557111","full_name":"antwjadam/z80nextlibrary","owner":"antwjadam","description":"A collection of spectrum and spectrum next assembly routines of various performance levels.","archived":false,"fork":false,"pushed_at":"2025-09-14T12:10:13.000Z","size":251,"stargazers_count":26,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-05T20:52:32.088Z","etag":null,"topics":["assembly","free","retrodev","routines","spectrum-next","z80","z80n","zx-spectrum"],"latest_commit_sha":null,"homepage":"","language":"Assembly","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antwjadam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-15T12:27:02.000Z","updated_at":"2025-09-21T08:30:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"6c757585-e650-41f2-917d-7133083f6160","html_url":"https://github.com/antwjadam/z80nextlibrary","commit_stats":null,"previous_names":["antwjadam/z80nextlibrary"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/antwjadam/z80nextlibrary","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antwjadam%2Fz80nextlibrary","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antwjadam%2Fz80nextlibrary/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antwjadam%2Fz80nextlibrary/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antwjadam%2Fz80nextlibrary/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antwjadam","download_url":"https://codeload.github.com/antwjadam/z80nextlibrary/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antwjadam%2Fz80nextlibrary/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29478354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-15T11:35:25.641Z","status":"ssl_error","status_checked_at":"2026-02-15T11:34:57.128Z","response_time":118,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assembly","free","retrodev","routines","spectrum-next","z80","z80n","zx-spectrum"],"created_at":"2025-10-05T20:47:19.069Z","updated_at":"2026-02-15T12:34:46.534Z","avatar_url":"https://github.com/antwjadam.png","language":"Assembly","readme":"# NextLibrary - Z80 Assembly Utilities Library for Spectrum and Next\n\n[![Platform: ZX Spectrum 48K](https://img.shields.io/badge/Platform-ZX%20Spectrum%2048K-blue.svg)](https://en.wikipedia.org/wiki/ZX_Spectrum)\n[![Platform: ZX Spectrum 128K](https://img.shields.io/badge/Platform-ZX%20Spectrum%20128K-blue.svg)](https://en.wikipedia.org/wiki/ZX_Spectrum)\n[![Platform: ZX Spectrum +2](https://img.shields.io/badge/Platform-ZX%20Spectrum%20%2B2-blue.svg)](https://en.wikipedia.org/wiki/ZX_Spectrum)\n[![Platform: ZX Spectrum +3](https://img.shields.io/badge/Platform-ZX%20Spectrum%20%2B3-blue.svg)](https://en.wikipedia.org/wiki/ZX_Spectrum)\n[![Platform: ZX Spectrum Next](https://img.shields.io/badge/Platform-ZX%20Spectrum%20Next-purple.svg)](https://www.specnext.com/)\n[![Assembly: Z80](https://img.shields.io/badge/Assembly-Z80-green.svg)](https://en.wikipedia.org/wiki/Zilog_Z80)\n[![Assembly: Z80N](https://img.shields.io/badge/Assembly-Z80N-orange.svg)](https://wiki.specnext.dev/Z80N_Extended_Opcodes)\n[![DMA: Supported](https://img.shields.io/badge/DMA-Supported-red.svg)](https://wiki.specnext.dev/DMA)\n[![Layer 2: Supported](https://img.shields.io/badge/Layer%202-Supported-purple.svg)](https://wiki.specnext.dev/Layer_2)\n\n**A high-performance, utility library for Z80 assembly development on the ZX Spectrum and ZX Spectrum Next platforms. The choice is yours, you can use device independent routines or limit yourself to platform specific routines for a single target architecture.**\n\nNextLibrary provides world-class mathematical operations, random number generation, screen management, DMA operations, and utility functions optimized for retro game development and system programming.\n\nIt offers hardware independent routines that work on both Spectrum and Spectrum Next hardware. It also provides equivalent and optimised Next only versions making use of Z80N Next only extended op codes and DMA for best performance.\n\nT-State tables in this document also allow for easy performance and requirement assessment by the developer.\n\n## Release History\n\n**v1.10** - Plus 3 Hardware Double Buffering\n\nKey improvements:\n- **Plus 3 Double Buffering**: Complete hardware double buffering implementation for Spectrum +3 and Next\n- **Hardware Bank Switching**: True hardware double buffering using +3 memory banking (99.93% faster than software copying)\n- **Smart Buffer Management**: Intelligent off-screen buffer setup with automatic bank switching at $C000\n- **Frame-Synchronized Swapping**: HALT-synchronized buffer swaps to eliminate screen tearing\n- **State Synchronization**: Robust tracking of hardware banking state with automatic correction\n- **Optimized Performance**: Buffer swaps in ~75-95 T-states vs 145,152 T-states for memory copying\n- **Complete Double Buffering Suite**: Software copying (all platforms) + hardware switching (+3/Next)\n- **Developer-Friendly API**: Simple setup, set, and toggle operations with consistent addressing\n\nPlus 3 double buffering performance achievements:\n- **Setup**: ~200 T-states + screen clearing (one-time initialization)\n- **Buffer Set**: ~35-55 T-states (smart banking, only when needed)\n- **Buffer Toggle**: ~75-95 T-states (instant display swap with HALT synchronization)\n- **Performance vs Software**: 99.93% faster than traditional screen copying methods\n- **Frame Rate**: 1 frame per vsync due to use of HALT, so we get 50FPS (PAL 50Hz) or 60FPS (NTSC 60Hz)\n\nArchitecture completion:\n- **Universal Double Buffering**: Software methods (48K/128K/+2/+3/Next) + hardware methods (+3/Next)\n- **Comprehensive Graphics Pipeline**: Traditional screens + Layer 2 + hardware double buffering\n- **Platform Optimization**: Optimal methods for every Spectrum platform and capability\n- **Complete Screen Operations**: Clearing, copying, and double buffering with maximum performance\n\n**v1.9** - Layer 2 Screen Copying and Advanced Graphics Pipeline\n\nKey improvements:\n- **Layer 2 Screen Copying**: Complete Layer 2 copying suite with 8 performance levels from LDIRX to DMA burst acceleration\n- **Multi-Resolution Layer 2 Support**: Full support for all Layer 2 modes (256×192, 320×256, 640×256) with optimized copying routines\n- **Layer 2 Manual Copying**: Direct address and resolution specification for maximum control\n  - LDIRX methods: 78,663 to 262,224 T-states depending on resolution\n  - DMA methods: 260 to 600 T-states for hardware-accelerated copying\n- **Layer 2 Auto-Detection Copying**: Intelligent Layer 2 configuration detection with optimal method selection\n  - Auto LDIRX: Automatic resolution detection with Z80N optimization\n  - Auto DMA: Automatic detection with DMA burst for maximum performance\n- **Advanced Graphics Pipeline**: Foundation for sophisticated Layer 2 graphics operations and double-buffering\n- **Unified Copy API**: Single interface supporting traditional screens and all Layer 2 modes seamlessly\n- **Performance Optimization**: Layer 2 256×192 mode 46% faster than traditional screen copying with LDIRX\n\nLayer 2 copying performance achievements:\n- **Layer 2 256×192 LDIRX**: 44.4 FPS (46% faster than traditional screen)\n- **Layer 2 320×256 LDIRX**: 26.7 FPS (enhanced resolution with good performance)\n- **Layer 2 640×256 LDIRX**: 13.3 FPS (maximum resolution with acceptable performance)\n- **Layer 2 DMA Methods**: 5,800+ to 13,500+ FPS (99.5%+ faster than traditional methods)\n- **Cross-Resolution Support**: Same API works across all Layer 2 modes with automatic optimization\n\nNotable pending features:\n- **Hardware Double Buffer Swapping**: Layer 2 bank switching for instant buffer swaps (SCREEN_COPY_LAYER2_DOUBLE_BUFFER_BANK)\n- **Plus 3 Double Buffering**: Spectrum Plus 3 specific double buffering optimization (SCREEN_COPY_DOUBLE_BUFFER_PLUS_3)\n\nArchitecture enhancements:\n- **Layer 2 Memory Management**: Intelligent handling of large Layer 2 buffers with multi-bank DMA operations\n- **Resolution-Aware Operations**: Automatic memory size calculation based on Layer 2 mode detection\n- **Hardware Acceleration**: DMA burst modes optimized for different Layer 2 resolutions\n- **Performance Scaling**: Excellent performance across all Next CPU speeds (3.5MHz to 28MHz)\n\n**v1.8** - Advanced Screen Copying and DMA Memory Operations\n\nKey improvements:\n- **Advanced Screen Copying**: Complete screen copying suite with 9 performance levels from basic LDIR to DMA acceleration\n- **Multi-Platform Screen Copy**: Unified API supporting all Spectrum platforms (48K/128K/+2/+3/Next)\n- **Software Optimization Mastery**: Progressive stack-based optimizations (1PUSH through ALLPUSH) achieving up to 27.9 FPS on standard Z80\n- **Z80N Screen Copy Acceleration**: LDIRX optimization providing 31.6 FPS theoretical maximum (31% faster than LDIR)\n- **DMA Screen Copy Revolution**: DMA-based copying achieving 12,500+ FPS theoretical maximum (99.8% faster than LDIR)\n- **Flexible Screen Addressing**: Support for custom source and destination addresses enabling advanced software double-buffering\n- **DMA Memory Copy Utilities**: General-purpose memory copying routines with standard and burst modes\n- **Frame Rate Analysis**: Complete performance analysis across all Next CPU speeds (3.5MHz to 28MHz)\n- **Optimal Algorithm Selection**: Intelligent performance level selection based on available hardware capabilities\n\nScreen copying performance achievements:\n- **Standard Z80 Peak**: ALLPUSH method achieving 27.9 FPS (14% faster than LDIR)\n- **Z80N Enhancement**: LDIRX achieving 31.6 FPS (24% faster than LDIR) \n- **DMA Acceleration**: DMA_BURST achieving 12,500+ FPS (99.8% faster than LDIR)\n\n**v1.7** - Layer 2 Graphics Support and Enhanced Modularity\n\nKey improvements:\n- **62 Test Cases**: Expanded test suite from 59 to 62 comprehensive test cases including Layer 2 validation\n- **Layer 2 Graphics Support**: Complete Layer 2 utility functions for Next hardware graphics programming\n- **Layer 2 Screen Clearing**: Ultra-fast Layer 2 clearing using DMA with support for all resolutions (256×192, 320×256, 640×256)\n- **Layer 2 Detection**: Hardware detection for active Layer 2 with resolution and address retrieval\n- **Enhanced Modularity**: Split constants and variables into separate domain-specific files for easier partial library adoption\n- **Improved Developer Experience**: Cleaner file organization allowing developers to include only needed components\n- **Layer 2 Information Retrieval**: Complete Layer 2 configuration detection including resolution, color depth, and memory requirements\n- **Extended Graphics Pipeline**: Foundation for advanced Layer 2 graphics operations and double-buffering\n\nLayer 2 performance improvements:\n- **Layer 2 DMA Clearing**: Up to 99.8% faster Layer 2 screen clearing using DMA burst mode\n- **Resolution Detection**: Fast Layer 2 configuration retrieval (83-157 T-states)\n- **Memory Efficient**: Optimized Layer 2 operations with minimal CPU overhead\n- **Hardware Adaptive**: Automatic fallback to standard operations if Layer 2 unavailable\n\nFile structure improvements:\n- **Modular Constants**: Separate files for Maths, Random, Display, and DMA constants\n- **Organized Variables**: Domain-specific variable files for cleaner partial library usage\n- **Developer Friendly**: Extract only needed components without dependency overhead\n\n**v1.6** - Enhanced Screen Management and DMA Support\n\nKey improvements:\n- **59 Test Cases**: Expanded test suite from 57 to 59 comprehensive test cases including DMA validation\n- **Flexible Screen Addressing**: All screen clearing routines now accept HL parameter for custom screen locations\n- **In-Memory Screen Support**: Full support for off-screen rendering and secondary screen buffers\n- **DMA Screen Clearing**: Ultra-fast screen clearing using Spectrum Next DMA controller (99% faster)\n- **Z80N Enhanced Screen Operations**: LDIRX optimization for 33% faster screen clearing on Next hardware\n- **Hardware Detection**: Automatic detection of Z80N and DMA capabilities with graceful fallbacks\n- **Memory Fill Operations**: Complete DMA memory fill routines with burst mode support\n- **Utility Functions**: Enhanced utility library with Z80N detection and DMA availability checking\n- **Performance Optimization**: Comprehensive T-States analysis and optimization across all screen operations\n\nScreen clearing performance improvements:\n- **Standard Z80**: Up to 74% faster with stack-based optimizations\n- **Z80N LDIRX**: Additional 33% improvement using extended opcodes  \n- **DMA Fill**: 99.7% faster pixel clearing, 99.2% faster attribute setting\n- **DMA Burst**: 99.8% faster with maximum hardware acceleration\n\n**v1.5** - Enhanced Test Framework and Maintainability\n\nKey improvements:\n- **Streamlined Test Framework**: Replaced repetitive test execution code with elegant table-driven loop system\n- **Simplified Test Management**: Adding new tests now requires only 3 simple steps: create TestCase0nn routine, add to table, update counter\n- **Reduced Code Complexity**: Test execution code reduced from ~240 lines to ~80 lines (70% reduction)\n- **Improved Maintainability**: Single point of control for test count and execution flow\n- **Enhanced Reliability**: Proper stack management and flag preservation throughout test execution\n- **Developer Experience**: Much easier to add, modify, or debug individual test cases\n- **Code Quality**: Eliminated copy-paste errors and inconsistencies in test execution flow\n\n**v1.4** - Enhanced 16-bit Random Operations with Z80N Support\n\nKey improvements:\n- **57 Test Cases**: Expanded test suite from 53 to 57 comprehensive test cases\n- **16-bit Random Algorithms**: Complete suite of 16-bit random number generators (LFSR, XOR Shift, Middle Square) with both standard Z80 and Z80N optimized versions\n- **Z80N 16-bit Random Performance**: 33-38% faster 16-bit random generation using MUL instruction\n- **Hardware-Accelerated 16-bit Random**: Single-cycle multiplication for enhanced Middle Square, optimized bit operations for LFSR/XOR Shift\n- **Seed Compatibility**: Z80N 16-bit random versions maintain identical output sequences to standard algorithms\n- **Complete T-state Analysis**: Accurate performance benchmarks for all random number generation routines\n\n**v1.3** - Enhanced Random 8-bit Operations with Z80N Support\n\nKey improvements:\n- **53 Test Cases**: Expanded test suite from 43 to 53 comprehensive test cases\n- **8-bit Random Algorithms**: Four standard Z80 algorithms (LCG, LFSR, XOR Shift, Middle Square) plus four Z80N optimized versions\n- **Z80N 8-bit Random Performance**: 20-47% faster random generation using MUL instruction\n- **Hardware-Accelerated Random**: Single-cycle multiplication for LCG, optimized bit operations for LFSR/XorShift/MiddleSquare\n- **Seed Compatibility**: Z80N random versions maintain identical output sequences to standard algorithms\n\n**v1.2** - Enhanced Division Operations with Z80N Support\n\nKey improvements:\n- **50 Test Cases**: Expanded test suite from 43 to 50 comprehensive test cases\n- **Enhanced 8÷8 Division**: Three Z80N options (COMPACT hybrid, BALANCED 8-bit reciprocal, MAXIMUM 16-bit reciprocal)\n- **Enhanced 16÷8 Division**: Three Z80N options with hybrid algorithms and high-precision reciprocal methods\n- **Accuracy Validation**: All algorithms pass comprehensive test validation ensuring mathematical correctness\n- **Performance Optimization**: Up to 95% faster division on Spectrum Next hardware\n- **Algorithm Selection**: Intelligent hybrid approaches combining traditional and reciprocal methods for optimal speed/precision balance\n\nThree primary Z80N division approaches are provided:\n1. **Hybrid routines**: Combination of MUL DE and traditional subtraction for optimal convergence  \n2. **8-bit Reciprocal methods**: Pre-computed reciprocal tables using Z80N MUL for maximum speed with validated precision\n2. **16-bit Reciprocal methods**: Pre-computed 16-bit reciprocal tables providing the highest precision over 8-bit reciprocals\n\n## Target Platforms\n\nThe following platforms are targetted. The main entry points and individual functionality are tagged with @COMPAT: 48K,128K,+2,+3,NEXT - where the list shown is the known compatability of the routine. I also detail below the main differences of the platforms which will give rise to the compatability of the routines. This means NEXT only routines will be tagged with just @COMPAT: NEXT, and if Z80N op codes are present, then @Z80N: MUL DE, ... will be documented as to which extended op codes are being used.\n\n### ZX SPECTRUM 48K:\n- CPU: Z80 @ 3.5MHz\n- Memory: 48KB RAM\n- Features: Basic ULA, beeper sound\n- Limitations: No extra RAM banks, no AY sound\n\n### ZX SPECTRUM 128K:\n- CPU: Z80 @ 3.5MHz  \n- Memory: 128KB RAM (banked)\n- Features: AY-3-8912 sound chip, extra RAM banks\n- New: Memory paging, enhanced sound\n\n### ZX SPECTRUM +2:\n- CPU: Z80 @ 3.5MHz\n- Memory: 128KB RAM (banked) \n- Features: Built-in tape deck, AY sound\n- Differences: Different ROM, tape interface\n\n### ZX SPECTRUM +3:\n- CPU: Z80 @ 3.5MHz\n- Memory: 128KB RAM (banked)\n- Features: Built-in disk drive, +2A/+3 ROM\n- New: Disk interface, different memory map\n\n### ZX SPECTRUM NEXT:\n- CPU: Z80N @ 3.5/7/14/28MHz\n- Memory: 1MB+ RAM, advanced banking\n- Features: Enhanced graphics, sprites, DMA\n- New: Z80N extended opcodes, copper, DMA controller\n\n## 🚀 **Current Features**\n\n### 📊 **Mathematical Operations** - 12 optimized algorithms\n- **8×8 Unsigned Multiplication**: Six performance levels (10-160 T-states)\n  - Standard Z80: COMPACT, BALANCED, MAXIMUM (35-160 T-states)\n  - Next Z80N: NEXT_COMPACT, NEXT_BALANCED, NEXT_MAXIMUM (10-29 T-states)\n- **16×8 Unsigned Multiplication**: Six performance levels (45-380 T-states)\n  - Standard Z80: COMPACT, BALANCED, MAXIMUM (45-380 T-states)\n  - Next Z80N: NEXT_COMPACT, NEXT_BALANCED, NEXT_MAXIMUM (97 T-states)\n- **8÷8 Unsigned Division**: Six performance levels (25-1975 T-states)\n  - Standard Z80: COMPACT, BALANCED, MAXIMUM (25-1975 T-states)\n  - Next Z80N: NEXT_COMPACT (40-400 T-states), NEXT_BALANCED (~175 T-states), NEXT_MAXIMUM (~218 T-states)\n- **16÷8 Unsigned Division**: High-precision 16-bit division with Z80N support\n  - Standard Z80: COMPACT, BALANCED, MAXIMUM (45-1300 T-states)\n  - Next Z80N: NEXT_COMPACT/BALANCED/MAXIMUM (107-520 T-states)\n\n### 🎲 **Random Number Generation** - 16 algorithms with Z80N acceleration  \n- **8-bit Random**: Eight algorithms (Standard Z80 + Z80N optimized versions)\n  - Standard Z80: LCG (45-55 T-states), LFSR (85-95 T-states), XorShift (35-45 T-states), Middle Square (115-150 T-states)\n  - Next Z80N: 20-47% faster with hardware MUL instruction\n- **16-bit Random**: Six algorithms (Standard Z80 + Z80N optimized versions)\n  - Standard Z80: LCG (85-95 T-states), LFSR (68 T-states), XorShift (55 T-states), Middle Square (78 T-states)\n  - Next Z80N: 30-38% faster with hardware acceleration\n\n### 🧹 **Screen Clearing and Memory Fill** - 17 performance levels\n- **Traditional ZX Spectrum Screen**: 9 performance levels from LDIR to DMA\n  - Cross-platform compatibility (48K, 128K, +2, +3, Next)\n  - Performance range: 149,504 T-states (LDIR) to 235 T-states (DMA)\n  - Frame rates: 23.4 FPS (LDIR) to 14,800+ FPS (DMA) at 3.5MHz\n- **Layer 2 Screen Clearing**: Next-only with manual and automatic modes\n  - Support for all resolutions: 256×192, 320×256, 640×256\n  - Manual address specification or automatic detection\n  - DMA acceleration for maximum performance\n\n### 🖥️ **Screen Copying and Memory Transfer** - 17 performance levels\n- **Unified Screen Copying**: Complete screen copying suite with 17 performance levels from LDIR to DMA burst acceleration\n  - Cross-platform compatibility: Options available for all Spectrum variants (48K, 128K, +2, +3, Next)\n  - Flexible addressing: Support for custom source and destination screen buffers\n  - Automatic optimization: Hardware detection with optimal performance level selection\n- **Traditional Screen Copying**: 9 performance levels for standard ZX Spectrum screens\n  - **SCREEN_COPY_COMPACT**: Standard LDIR operation (145,152 T-states full copy)\n  - **SCREEN_COPY_1PUSH to SCREEN_COPY_ALLPUSH**: Stack optimizations (173,278 to 124,908 T-states)\n  - **SCREEN_COPY_Z80N_COMPACT**: Z80N LDIRX optimization (110,612 T-states, 24% faster)\n  - **SCREEN_COPY_DMA_FILL**: DMA memory transfer (300 T-states, 99.8% faster)\n  - **SCREEN_COPY_DMA_BURST**: DMA burst mode (270 T-states, 99.8% faster)\n- **Layer 2 Screen Copying**: 8 Next-only performance levels for enhanced graphics\n  - **Manual Layer 2 Methods**: Direct address and resolution specification\n    - LDIRX methods: 78,663 to 262,224 T-states depending on resolution\n    - DMA methods: 260 to 600 T-states for hardware-accelerated copying\n  - **Auto Layer 2 Detection**: Intelligent configuration detection with optimal method selection\n    - Auto LDIRX: Automatic resolution detection with Z80N optimization\n    - Auto DMA: Automatic detection with DMA burst for maximum performance\n- **Frame Rate Capabilities**: Revolutionary performance across Next CPU speeds\n  - **Traditional Screen**: 24.1 FPS (LDIR) to 12,500+ FPS (DMA) at 3.5MHz\n  - **Layer 2 256×192**: 44.4 FPS (LDIRX) to 13,500+ FPS (DMA) at 3.5MHz\n  - **Layer 2 320×256**: 26.7 FPS (LDIRX) to 10,000+ FPS (DMA) at 3.5MHz\n  - **Layer 2 640×256**: 13.3 FPS (LDIRX) to 5,800+ FPS (DMA) at 3.5MHz\n  - **28MHz scaling**: Up to 100,000+ FPS (traditional) / 108,000+ FPS (Layer 2) maximum\n- **Advanced Features**: Double-buffering, off-screen composition, memory-to-memory transfers, multi-resolution support\n\n### 🎨 **Layer 2 Graphics** - Complete Next graphics support\n- **Layer 2 Detection**: Hardware detection and configuration retrieval\n- **Resolution Support**: All Layer 2 modes (256×192, 320×256, 640×256)\n- **Memory Management**: Automatic address calculation and bank management\n\n### ⚡ **DMA Support** - 4 hardware-accelerated operations\n- **Memory Operations**: Fill, copy, and burst modes\n- **Hardware Detection**: Automatic fallback if DMA unavailable\n- **Performance**: ~235-300 T-states CPU overhead + parallel hardware transfer\n\n### ⌨️ **Input and Keyboard Utilities** - Cross-platform input handling\n- **Keyboard Scanning**: Comprehensive keyboard input detection across all Spectrum variants\n- **Player Interaction**: Wait for player input with timeout and validation options\n- **Cross-Platform Input**: Unified input handling for 48K, 128K, +2, +3, and Next\n- **Performance Optimized**: Fast input scanning with minimal CPU overhead\n\n### 📝 **Text and Font System** - Embedded font with rendering utilities\n- **Text Rendering**: Advanced text display utilities for all screen modes\n- **Embedded Font**: Built-in font system for consistent text across platforms\n- **String Utilities**: Text manipulation and display positioning functions\n- **Cross-Platform Text**: Unified text rendering across all Spectrum variants\n\n### 🏆 **Scoring and Data Management** - Score conversion and display\n- **Score Conversion**: 16-bit score to display string conversion utilities\n- **Display Integration**: Seamless integration with text rendering system\n- **Performance Optimized**: Fast score display for real-time games\n- **Format Control**: Flexible score formatting and padding options\n\n### 🔧 **Utility Functions** - Hardware detection and memory management\n- **Hardware Detection**: Z80N processor and DMA controller detection\n- **Memory Operations**: Efficient memory management utilities\n\n### Screen Clearing T-States Performance\n\n#### Traditional ZX Spectrum Screen (32x24 character, 256x192 pixel)\n\n| Performance Level | Full Clear | Pixel Clear | Attr Clear | Platform | Frame Rate (3.5MHz) |\n|-------------------|------------|-------------|------------|----------|---------------------|\n| **SCREEN_COMPACT** | 149,504 T | 132,608 T | 16,896 T | All | 23.4 FPS |\n| **SCREEN_1PUSH** | 177,152 T | 157,696 T | 19,456 T | All | 19.7 FPS |\n| **SCREEN_2PUSH** | 152,832 T | 135,936 T | 16,896 T | All | 22.9 FPS |\n| **SCREEN_4PUSH** | 140,672 T | 125,056 T | 15,616 T | All | 24.9 FPS |\n| **SCREEN_8PUSH** | 134,592 T | 119,616 T | 15,056 T | All | 26.0 FPS |\n| **SCREEN_ALLPUSH** | 128,512 T | 114,176 T | 14,336 T | All | 27.1 FPS |\n| **SCREEN_Z80N_COMPACT** | 114,176 T | 101,376 T | 12,800 T | Next | 30.6 FPS |\n| **SCREEN_DMA_FILL** | 240 T | 240 T | 240 T | Next | 14,500+ FPS |\n| **SCREEN_DMA_BURST** | 235 T | 235 T | 235 T | Next | 14,800+ FPS |\n\n#### Spectrum Next Layer 2 Screen Clearing (Next Only)\n\n| Performance Level | 256x192 Clear | 320x256 Clear | 640x256 Clear | Frame Rate (3.5MHz) |\n|-------------------|---------------|---------------|---------------|---------------------|\n| **SCREEN_LAYER2_MANUAL_256by192** | ~205,000 T | N/A | N/A | 17.1 FPS |\n| **SCREEN_LAYER2_MANUAL_320by256** | N/A | ~350,000 T | N/A | 10.0 FPS |\n| **SCREEN_LAYER2_MANUAL_640by256** | N/A | N/A | ~700,000 T | 5.0 FPS |\n| **SCREEN_LAYER2_MANUAL_DMA_256by192** | ~280 T | N/A | N/A | 12,500+ FPS |\n| **SCREEN_LAYER2_MANUAL_DMA_320by256** | N/A | ~320 T | N/A | 10,900+ FPS |\n| **SCREEN_LAYER2_MANUAL_DMA_640by256** | N/A | N/A | ~400 T | 8,700+ FPS |\n| **SCREEN_LAYER2_AUTO_ACTIVE** | ~205,000 T | ~350,000 T | ~700,000 T | Variable |\n| **SCREEN_LAYER2_AUTO_DMA** | ~280 T | ~320 T | ~400 T | 8,700+ FPS |\n\n### Layer 2 Memory Requirements\n\n| Layer 2 Mode | Resolution | Memory Size | Bytes to Clear | DMA Operations |\n|--------------|------------|-------------|----------------|----------------|\n| **256x192** | 256×192×8bit | 48KB | 49,152 bytes | Single operation |\n| **320x256** | 320×256×8bit | 80KB | 81,920 bytes | Two 40KB operations |\n| **640x256** | 640×256×8bit | 160KB | 163,840 bytes | Four 40KB operations |\n\n### Frame Rate Capabilities Across Next CPU Speeds\n\n#### Traditional Screen Clearing (Fastest: DMA_BURST)\n- **3.5MHz**: 14,800+ FPS maximum (235 T-states per clear)\n- **7MHz**: 29,600+ FPS maximum (117 T-states effective)\n- **14MHz**: 59,200+ FPS maximum (58 T-states effective)\n- **28MHz**: 118,400+ FPS maximum (29 T-states effective)\n\n#### Layer 2 Screen Clearing (Fastest: LAYER2_AUTO_DMA)\n- **3.5MHz**: 8,700+ FPS maximum (400 T-states for 640x256)\n- **7MHz**: 17,400+ FPS maximum (200 T-states effective)\n- **14MHz**: 34,800+ FPS maximum (100 T-states effective)\n- **28MHz**: 69,600+ FPS maximum (50 T-states effective)\n\n### 🖥️ **Screen Copying and Memory Transfer**\n- **Unified Screen Copying**: Complete screen copying suite with 9 performance levels\n  - Cross-platform compatibility: Works on all Spectrum variants (48K, 128K, +2, +3, Next)\n  - Flexible addressing: Support for custom source and destination screen buffers\n  - Automatic optimization: Hardware detection with optimal performance level selection\n- **Performance Progression**: From basic LDIR to ultimate DMA acceleration\n  - **SCREEN_COPY_COMPACT**: Standard LDIR operation (145,152 T-states full copy)\n  - **SCREEN_COPY_1PUSH to SCREEN_COPY_ALLPUSH**: Stack optimizations (173,278 to 124,908 T-states)\n  - **SCREEN_COPY_Z80N_COMPACT**: Z80N LDIRX optimization (110,612 T-states, 24% faster)\n  - **SCREEN_COPY_DMA_FILL**: DMA memory transfer (300 T-states, 99.8% faster)\n  - **SCREEN_COPY_DMA_BURST**: DMA burst mode (270 T-states, 99.8% faster)\n- **Frame Rate Capabilities**: Revolutionary performance across Next CPU speeds\n  - 3.5MHz: Up to 27.9 FPS (ALLPUSH) / 31.6 FPS (Z80N) / 12,500+ FPS (DMA)\n  - 28MHz: Up to 224 FPS (ALLPUSH) / 253 FPS (Z80N) / 100,000+ FPS (DMA)\n- **Advanced Features**: Double-buffering, off-screen composition, memory-to-memory transfers\n\n### Screen Copying T-States Performance\n\n| Performance Level | Full Copy | Pixel Copy | Attr Copy | Platform | Frame Rate (3.5MHz) |\n|-------------------|-----------|------------|-----------|----------|---------------------|\n| **SCREEN_COPY_COMPACT** | 145,152 T | 129,024 T | 16,128 T | All | 24.1 FPS |\n| **SCREEN_COPY_1PUSH** | 173,278 T | 154,036 T | 19,342 T | All | 20.2 FPS |\n| **SCREEN_COPY_2PUSH** | 149,086 T | 132,532 T | 16,654 T | All | 23.5 FPS |\n| **SCREEN_COPY_4PUSH** | 136,990 T | 121,780 T | 15,310 T | All | 25.5 FPS |\n| **SCREEN_COPY_8PUSH** | 130,942 T | 116,404 T | 14,638 T | All | 26.7 FPS |\n| **SCREEN_COPY_ALLPUSH** | 124,908 T | 111,042 T | 13,980 T | All | 27.9 FPS |\n| **SCREEN_COPY_Z80N_COMPACT** | 110,612 T | 98,324 T | 12,308 T | Next | 31.6 FPS |\n| **SCREEN_COPY_DMA_FILL** | 300 T | 300 T | 300 T | Next | 12,500+ FPS |\n| **SCREEN_COPY_DMA_BURST** | 270 T | 270 T | 270 T | Next | 12,500+ FPS |\n\n#### Screen Copying Performance Compared with Layer 2\n\n| Method | Traditional Screen | Layer 2 256×192 | Layer 2 320×256 | Layer 2 640×256 |\n|--------|-------------------|-----------------|-----------------|-----------------|\n| **LDIR** | 145,152 T (24.1 FPS) | N/A | N/A | N/A |\n| **ALLPUSH** | 124,908 T (27.9 FPS) | N/A | N/A | N/A |\n| **Z80N LDIRX** | 110,612 T (31.6 FPS) | 78,663 T (44.4 FPS) | 131,112 T (26.7 FPS) | 262,224 T (13.3 FPS) |\n| **DMA** | 270 T (12,500+ FPS) | 260 T (13,500+ FPS) | 350 T (10,000+ FPS) | 600 T (5,800+ FPS) |\n\n### Plus 3 Hardware Double Buffering\n\n#### Plus 3 Double Buffering Support (Hardware Bank Switching)\n\n| Operation | T-States | Platform | Description |\n|-----------|----------|----------|-------------|\n| **PLUS3_SETUP_DOUBLE_BUFFER** | ~200 T + clearing | +3, Next | One-time initialization with screen clearing |\n| **PLUS3_SET_OFFSCREEN_BUFFER** | ~35-55 T | +3, Next | Smart banking (only when needed) |\n| **PLUS3_DOUBLE_BUFFER_TOGGLE** | ~75-95 T | +3, Next | Instant buffer swap with HALT sync |\n\n#### Performance Comparison: Software vs Hardware Double Buffering\n\n| Method | T-States | Frame Rate (3.5MHz) | Performance vs Software |\n|--------|----------|---------------------|------------------------|\n| **Software Copy (LDIR)** | 145,152 T | 24.1 FPS | Baseline |\n| **Software Copy (ALLPUSH)** | 124,908 T | 27.9 FPS | 14% faster |\n| **Software Copy (DMA)** | 270 T | 12,500+ FPS | 99.8% faster |\n| **Hardware Toggle (+3)** | ~95 T | Unlimited* | 99.93% faster |\n| **Hardware Toggle (+3)** | ~95 T | **50 FPS** | **VSync limited** |\n\n*Software copying can exceed 50 FPS but results in screen tearing without VSync synchronization\n\n#### Plus 3 Banking Memory Layout\n\n| Memory Range | Visible Screen | Off-Screen Buffer | Screen Display |\n|--------------|----------------|-------------------|----------------|\n| **$4000-$7FFF** | Screen Bank 5 | N/A | ULA displays Bank 5 |\n| **$C000-$FFFF** | N/A | Screen Bank 7 | Ready for drawing |\n| **After Toggle** | Screen Bank 7 | Screen Bank 5 | ULA displays Bank 7 |\n| **$C000-$FFFF** | N/A | Screen Bank 5 | Ready for drawing |\n\n#### Plus 3 Double Buffering Advantages\n\n- **Instant Swaps**: Buffer switching in ~95 T-states regardless of screen content\n- **No Memory Copying**: Hardware bank switching eliminates data transfer overhead\n- **Tearing Elimination**: HALT synchronization ensures clean frame boundaries at 50 FPS\n- **Consistent Performance**: Buffer swap time independent of screen complexity\n- **Memory Efficient**: No duplicate screen data, uses existing +3 banking\n- **Always $C000**: Off-screen buffer always accessible at same address for drawing\n- **Perfect 50 FPS**: Synchronized with display refresh for smooth animation\n\n### Layer 2 Utility T-States\n\n| Function | Scenario | T-States | Description |\n|----------|----------|----------|-------------|\n| **CheckForActiveLayer2** | Next Not Found | 109 | Z80N detection fails |\n| **CheckForActiveLayer2** | Layer 2 Inactive | 151 | Z80N found, Layer 2 disabled |\n| **CheckForActiveLayer2** | Layer 2 Active | 157 | Z80N found, Layer 2 enabled |\n| **GetActiveLayer2Addr** | No Layer 2 | 87 | Returns HL = 0 |\n| **GetActiveLayer2Addr** | Active Layer 2 | 83 | Returns Layer 2 base address |\n| **GetLayer2FullInfo** | Variable | 200-300+ | Complete configuration retrieval |\n\n### Layer 2 Constants\n\n```asm\n; Layer 2 Memory Requirements\nLAYER2_BYTES_256x192         EQU     $C000   ; 49,152 bytes (48KB) - 256×192 mode\nLAYER2_BYTES_320x256         EQU     $14000  ; 81,920 bytes (80KB) - 320×256 mode  \nLAYER2_BYTES_640x256         EQU     $28000  ; 163,840 bytes (160KB) - 640×256 mode\nLAYER2_BYTES_320x256_HALF    EQU     $A000   ; 40,960 bytes (40KB) - Half for 16-bit DMA\n\n; Layer 2 Resolutions\nLAYER2_RESOLUTION_256x192    EQU     0       ; Standard Layer 2 resolution\nLAYER2_RESOLUTION_320x256    EQU     1       ; Enhanced Layer 2 resolution\nLAYER2_RESOLUTION_640x256    EQU     2       ; Maximum Layer 2 resolution\n\n; Layer 2 Performance Levels\nSCREEN_LAYER2_DMA_FILL       EQU     9       ; Layer 2 DMA fill mode\nSCREEN_LAYER2_DMA_BURST      EQU     10      ; Layer 2 DMA burst mode\n\n### Random Generation T-States\n\n| Algorithm | 8-bit Standard | 8-bit Z80N | 16-bit Standard | 16-bit Z80N | Improvement |\n|-----------|----------------|------------|-----------------|-------------|-------------|\n| **LCG** | 45-55 | 25-35 | 85-95 | 55-65 | 30-35% faster |\n| **LFSR** | 85-95 | 45-55 | 68 | 42 | 38-47% faster |\n| **XorShift** | 35-45 | 20-30 | 55 | 35 | 36-43% faster |\n| **Middle Square** | 115-150 | 65-75 | 78 | 48 | 33-43% faster |\n\n#### Random Algorithm Constants\n\n```asm\n; 8-bit Standard Z80 Random Algorithms\nPERFORMANCE_STANDARD_RANDOM_LCG       EQU 0    ; Linear Congruential Generator (45-55 T-states)\nPERFORMANCE_STANDARD_RANDOM_LFSR      EQU 1    ; Linear Feedback Shift Register (85-95 T-states)\nPERFORMANCE_STANDARD_RANDOM_XORSHIFT  EQU 2    ; XorShift Algorithm (35-45 T-states)\nPERFORMANCE_STANDARD_RANDOM_MIDDLESQUARE EQU 3 ; Middle Square Method (115-150 T-states)\n\n; 8-bit Next Z80N Random Algorithms  \nPERFORMANCE_Z80N_RANDOM_LCG           EQU 4    ; Z80N optimized LCG (25-35 T-states)\nPERFORMANCE_Z80N_RANDOM_LFSR          EQU 5    ; Z80N optimized LFSR (45-55 T-states)\nPERFORMANCE_Z80N_RANDOM_XORSHIFT      EQU 6    ; Z80N optimized XorShift (20-30 T-states)\nPERFORMANCE_Z80N_RANDOM_MIDDLESQUARE  EQU 7    ; Z80N optimized Middle Square (65-75 T-states)\n\n; 16-bit Standard Z80 Random Algorithms\nPERFORMANCE_STANDARD_RANDOM16_LCG      EQU 0   ; Linear Congruential Generator (85-95 T-states)\nPERFORMANCE_STANDARD_RANDOM16_LFSR     EQU 1   ; Linear Feedback Shift Register (68 T-states)\nPERFORMANCE_STANDARD_RANDOM16_XORSHIFT EQU 2   ; XorShift Algorithm (55 T-states)\nPERFORMANCE_STANDARD_RANDOM16_MIDDLESQUARE EQU 3 ; Middle Square Method (78 T-states)\n\n; 16-bit Next Z80N Random Algorithms\nPERFORMANCE_Z80N_RANDOM16_LCG          EQU 4   ; Z80N optimized 16-bit LCG (55-65 T-states)\nPERFORMANCE_Z80N_RANDOM16_LFSR         EQU 5   ; Z80N optimized 16-bit LFSR (42 T-states)\nPERFORMANCE_Z80N_RANDOM16_XORSHIFT     EQU 6   ; Z80N optimized 16-bit XorShift (35 T-states)\nPERFORMANCE_Z80N_RANDOM16_MIDDLESQUARE EQU 7   ; Z80N optimized 16-bit Middle Square (48 T-states)\n```\n\n### Utility Functions T-States\n\n| Function | Available Path | Not Available Path | Description |\n|----------|---------------|-------------------|-------------|\n| **CheckOnZ80N** | 81 T-states | 60 T-states | Z80N processor detection |\n| **CheckDMAAvailable** | 66 T-states | 58 T-states | DMA controller detection |\n\n### ⚡ **DMA Support (Spectrum Next)**\n\n- **DMA Memory Fill**: High-speed memory filling using DMA controller\n  - CPU overhead: ~240-260 T-states for setup\n  - Hardware transfer: Parallel to CPU execution\n  - Automatic fallback to standard routines if DMA unavailable\n- **DMA Burst Fill**: Maximum performance burst mode operations\n  - CPU overhead: ~235-250 T-states for setup\n  - Fastest possible memory operations on Next hardware\n  - Optimized wait loops with comprehensive status checking\n- **DMA Memory Copy**: General-purpose memory copying using DMA controller\n  - CPU overhead: ~280-300 T-states for setup\n  - Hardware transfer: Parallel to CPU execution (effectively instantaneous)\n  - Support for any source/destination addresses and transfer sizes\n- **DMA Memory Copy Burst**: Ultra-fast burst mode memory copying\n  - CPU overhead: ~260-280 T-states for setup\n  - Maximum DMA performance with burst transfer modes\n  - Ideal for screen copying and large memory operations\n\n### DMA Operations T-States\n\n| Operation | CPU Overhead | Hardware Time | Use Case |\n|-----------|-------------|---------------|----------|\n| **DMA_MemoryFill** | ~240-260 T | Parallel | Memory clearing, pattern fills |\n| **DMA_BurstFill** | ~235-250 T | Parallel | Ultra-fast memory clearing |\n| **DMA_MemoryCopy** | ~280-300 T | Parallel | Memory copying, screen copying |\n| **DMA_MemoryCopy_Burst** | ~260-280 T | Parallel | Ultra-fast memory copying |\n\n## 🧮 **Random Number Generation Algorithms**\n\nNextLibrary provides comprehensive 8-bit and 16-bit random number generation algorithms optimized for different scenarios:\n\n### 8-bit Standard Z80 Methods\n- **LCG (Linear Congruential Generator)**: Fast with good uniformity (45-55 T-states)\n  - Best for: High-speed applications requiring uniform distribution\n  - Uses: Multiply-add formula (a*seed + c) mod m\n  - Quality: Good distribution, very fast, widely used in games\n\n- **LFSR (Linear Feedback Shift Register)**: High-quality randomness (85-95 T-states)\n  - Best for: Cryptographic applications, high-quality sequences\n  - Uses: Bit shifting with XOR feedback polynomial\n  - Quality: Excellent distribution, maximum period (255 values)\n\n- **XorShift**: Fast with good quality (35-45 T-states)\n  - Best for: Games requiring fast, good-quality random numbers\n  - Uses: XOR and bit shifting operations\n  - Quality: Very good distribution, fast execution\n\n- **Middle Square**: Classic algorithm with moderate speed (115-150 T-states)\n  - Best for: Educational purposes, moderate quality needs\n  - Uses: Squares the seed and extracts middle digits\n  - Quality: Fair distribution, requires careful seed management\n\n### 8-bit Next Z80N Methods (Spectrum Next Only)\n- **Z80N_LCG**: Hardware-accelerated LCG (25-35 T-states)\n  - Best for: Ultra-fast uniform random generation\n  - Uses: Z80N MUL for single-cycle multiplication in LCG formula\n  - Quality: Same as standard LCG, 35% faster execution\n\n- **Z80N_LFSR**: Z80N optimized LFSR (45-55 T-states)\n  - Best for: High-quality randomness with hardware acceleration\n  - Uses: Z80N MUL for efficient bit extraction and manipulation\n  - Quality: Same as standard LFSR, 47% faster execution\n\n- **Z80N_XorShift**: Ultra-fast XorShift (20-30 T-states)\n  - Best for: Fastest possible random generation with good quality\n  - Uses: Z80N MUL for optimized bit operations\n  - Quality: Same as standard XorShift, 43% faster execution\n\n- **Z80N_MiddleSquare**: Hardware-accelerated Middle Square (65-75 T-states)\n  - Best for: Classic algorithm with modern performance\n  - Uses: Z80N MUL for single-cycle squaring operation\n  - Quality: Same as standard Middle Square, 43% faster execution\n\n### 16-bit Standard Z80 Methods\n- **LCG (Linear Congruential Generator)**: Fast 16-bit uniform generation (85-95 T-states)\n  - Best for: High-speed 16-bit applications requiring uniform distribution\n  - Uses: Extended 16-bit multiply-add formula\n  - Quality: Good distribution, fastest 16-bit method, widely used\n\n- **LFSR (Linear Feedback Shift Register)**: High-quality 16-bit randomness (68 T-states)\n  - Best for: High-quality sequences with extended precision\n  - Uses: 16-bit polynomial feedback with bit manipulation\n  - Quality: Excellent distribution, maximum period (65535 values)\n\n- **XorShift**: Fast 16-bit generation (55 T-states)\n  - Best for: Fast 16-bit random numbers with good quality\n  - Uses: Extended XOR and bit shifting operations\n  - Quality: Very good distribution, optimal speed/quality balance\n\n- **Middle Square**: 16-bit classic algorithm (78 T-states)\n  - Best for: Extended precision middle square method\n  - Uses: 16-bit squaring with middle extraction\n  - Quality: Good distribution with careful seed management\n\n### 16-bit Next Z80N Methods (Spectrum Next Only)\n- **Z80N_LCG**: Hardware-accelerated 16-bit LCG (55-65 T-states)\n  - Best for: Ultra-fast 16-bit uniform random generation\n  - Uses: Z80N MUL for efficient 16-bit LCG calculations\n  - Quality: Same as standard 16-bit LCG, 30% faster execution\n\n- **Z80N_LFSR**: Z80N optimized 16-bit LFSR (42 T-states)\n  - Best for: High-quality 16-bit randomness with hardware acceleration\n  - Uses: Z80N MUL for efficient 16-bit polynomial operations\n  - Quality: Same as standard 16-bit LFSR, 38% faster execution\n\n- **Z80N_XorShift**: Ultra-fast 16-bit XorShift (35 T-states)\n  - Best for: Fastest possible 16-bit random generation\n  - Uses: Z80N MUL for optimized 16-bit bit operations\n  - Quality: Same as standard 16-bit XorShift, 36% faster execution\n\n- **Z80N_MiddleSquare**: Hardware-accelerated 16-bit Middle Square (48 T-states)\n  - Best for: 16-bit classic algorithm with modern performance\n  - Uses: Z80N MUL for single-cycle 16-bit squaring operation\n  - Quality: Same as standard 16-bit Middle Square, 38% faster execution\n\n### Z80N Performance Benefits\n\nThe Z80N optimized versions provide significant performance improvements:\n\n- **Hardware Multiplication**: Single-cycle MUL instruction vs multi-cycle addition loops\n- **Efficient Bit Operations**: MUL-based bit extraction vs traditional shifting\n- **Maintained Quality**: Identical mathematical properties and output sequences\n- **Same Seed Compatibility**: Drop-in replacements for standard versions\n\n**Performance Summary**: Z80N random generators are 30-47% faster while maintaining identical output quality and seed compatibility with their standard Z80 counterparts.\n\n## 📝 **TODO List**\n\n### 🖥️ **Display \u0026 Graphics**\n- Extended screen manipulation functions, e.g. line draw, fill, patterned fill\n- Software sprite routines and Next hardware sprite routines\n\n### 🎮 **Input Systems**  \n- Joystick input support with multiple controller options\n- Enhanced text input utilities and keyboard handling\n\n### 🏆 **Scoring \u0026 Data**\n- Extended scoring system supporting up to 12-digit scores (beyond 65535)\n- Leaderboard and score table management utilities\n\n### 🔊 **Audio Support**\n- Beeper sound utilities (tags likely to be @COMPAT: 48K,128K,+2,+3,NEXT)\n- AY sound routines (tags likely to be @COMPAT: 128K,+2,+3,NEXT, @REQUIRES: AY-3-8912)\n\n### 🏦 **Memory Banking**\n- Memory bank switching and loading (tags likely to be @COMPAT: 128K,+2,+3,NEXT, @REQUIRES: Memory banking)\n\n### 💾 **Loading and Saving**\n- Tape routines (tags likely to be @COMPAT: 48K,128K,+2,+3,NEXT)\n- Microdrive routines (tags likely to be @COMPAT: 48K,128K,+2,+3) - likely requires Next to be in a required compatibility mode with microdrive interface attached and active - not sure yet\n- Disk routines (tags likely to be @COMPAT: +3, @REQUIRES: +3 disk interface) - potentially no Next compatibility as very hardware based\n\n### 🚀 **Advanced Next Features**\n*Tagged @COMPAT NEXT as they will be specific to the Next, this list is not exhaustive*\n- Sprites (tags likely to be @COMPAT: NEXT, @REQUIRES: Next sprites)\n- Copper (tags likely to be @COMPAT: NEXT, @REQUIRES: Next copper)  \n- Enhanced DMA operations (pattern fills, memory copies, etc.)\n- More features... (one thing added at a time)\n\n### ⚡ **Optimization**\n- Complete T-state optimization pass (e.g., replace JR with JP where beneficial)\n- Memory usage optimization analysis\n\n## 🎯 **Performance Levels**\n\nNextLibrary uses a unified performance system across all mathematical operations:\n\n| Performance Level | Characteristics | Use Case |\n|-------------------|----------------|----------|\n| **PERFORMANCE_COMPACT** | Variable timing, minimal code size | Memory-constrained applications |\n| **PERFORMANCE_BALANCED** | Fixed timing, predictable performance | Real-time applications, games |\n| **PERFORMANCE_MAXIMUM** | Optimized for speed, larger code size | Performance-critical operations |\n| **PERFORMANCE_NEXT_COMPACT** | Z80N MUL instruction, fastest code | Next-only, maximum speed |\n| **PERFORMANCE_NEXT_BALANCED** | Z80N MUL with overflow checking | Next-only, speed + validation |\n| **PERFORMANCE_NEXT_MAXIMUM** | Z80N MUL with special case handling | Next-only, optimized edge cases |\n\n## 📋 **Quick Start**\n\n### Basic Usage Example\n\n```asm\n; Include the NextLibrary\nINCLUDE \"NextLibrary.asm\"\n\n; 8×8 Multiplication Example (Standard Z80)\nLD      A, 25           ; Multiplicand\nLD      B, 12           ; Multiplier  \nLD      C, PERFORMANCE_BALANCED\nCALL    Multiply8x8_Unified\n; Result in HL = 300\n\n; 8×8 Multiplication Example (Next Z80N - Ultra Fast!)\nLD      A, 25           ; Multiplicand\nLD      B, 12           ; Multiplier\nLD      C, PERFORMANCE_NEXT_COMPACT   ; Uses Z80N MUL instruction\nCALL    Multiply8x8_Unified\n; Result in HL = 300 (85% faster!)\n\n; 8-bit Random Number Generation\nLD      A, 15           ; Upper limit (inclusive)\nLD      B, 123          ; Seed value\nLD      C, PERFORMANCE_Z80N_RANDOM_LFSR ; Z80N LFSR algorithm\nCALL    Random8_Unified_Seed\n; First random value in A\n\nLD      A, 7            ; New upper limit\nLD      C, PERFORMANCE_Z80N_RANDOM_LFSR ; Same algorithm  \nCALL    Random8_Unified_Next\n; Next random value in A\n\n; 16-bit Random Number Generation\nLD      HL, 1000        ; Upper limit (inclusive)\nLD      BC, 9876        ; Seed value\nLD      D, PERFORMANCE_Z80N_RANDOM16_MIDDLESQUARE ; Z80N 16-bit Middle Square\nCALL    Random16_Unified_Seed\n; First random value in HL\n\nLD      HL, 5000        ; New upper limit\nLD      D, PERFORMANCE_Z80N_RANDOM16_MIDDLESQUARE ; Same algorithm\nCALL    Random16_Unified_Next  \n; Next random value in HL\n```\n\n### Enhanced Screen Management Examples\n\n```asm\n; Clear entire screen with white on black attributes (standard screen)\nLD      A, %00000111    ; White ink, black paper\nLD      C, SCREEN_4PUSH ; High performance mode\nLD      HL, 0           ; Use default screen address (16384)\nCALL    Screen_FullReset_Unified\n\n; Clear off-screen buffer with different attributes\nLD      A, %01000010    ; Green ink, black paper, bright\nLD      C, SCREEN_DMA_FILL ; Ultra-fast DMA clearing (Next only)\nLD      HL, $8000       ; Custom screen buffer address\nCALL    Screen_FullReset_Unified\n\n; Clear pixels only in secondary screen buffer\nLD      A, 0            ; Not used for pixel clearing\nLD      C, SCREEN_Z80N_COMPACT ; Z80N LDIRX optimization\nLD      HL, $C000       ; Another screen buffer\nCALL    Screen_ClearPixel_Unified\n\n; Set attributes only in main screen, preserve pixels\nLD      A, %01000010    ; Green ink, black paper, bright\nLD      C, SCREEN_DMA_BURST ; Maximum DMA performance (Next only)\nLD      HL, 0           ; Use default screen address\nCALL    Screen_ClearAttr_Unified\n\n; High-speed double buffering example\n; Clear back buffer at maximum speed while displaying front buffer\nLD      A, %00000111    ; White on black\nLD      C, SCREEN_DMA_BURST ; Fastest possible\nLD      HL, BackBuffer  ; Address of back buffer\nCALL    Screen_FullReset_Unified\n; ... render to back buffer ...\n; ... swap buffers when ready ...\n\n; Hardware detection example\nCALL    CheckOnZ80N     ; Check if Z80N available\nJR      Z, UseStandard  ; Use standard routines if not Z80N\n\nCALL    CheckDMAAvailable ; Check if DMA available\nJR      Z, Z80NOnly     ; Use Z80N only if no DMA\n\n; Use DMA for maximum performance\nLD      C, SCREEN_DMA_BURST\nJR      ClearScreen\n\nZ80NOnly:\nLD      C, SCREEN_Z80N_COMPACT ; Use Z80N optimizations\nJR      ClearScreen\n\nUseStandard:\nLD      C, SCREEN_ALLPUSH\n\nClearScreen:\nLD      A, %00000111    ; Screen attribute\nLD      HL, 0           ; Default screen\nCALL    Screen_FullReset_Unified\n```\n\n### Layer 2 Screen Clearing Examples\n\n```asm\n; Basic Layer 2 clear with manual address (256x192 mode)\nLD      HL, $4000               ; Layer 2 bank address\nLD      A, $E3                  ; Bright yellow color (palette index)\nLD      C, SCREEN_LAYER2_MANUAL_256by192\nCALL    Screen_FullReset_Unified\n\n; High-resolution Layer 2 clear (320x256 mode)\nLD      HL, $6000               ; Layer 2 bank address for 320x256\nLD      A, $1F                  ; Bright blue color\nLD      C, SCREEN_LAYER2_MANUAL_320by256\nCALL    Screen_FullReset_Unified\n\n; Maximum resolution Layer 2 clear (640x256 mode)\nLD      HL, $8000               ; Layer 2 bank address for 640x256\nLD      A, $07                  ; White color\nLD      C, SCREEN_LAYER2_MANUAL_640x256\nCALL    Screen_FullReset_Unified\n\n; Ultra-fast DMA Layer 2 clear (auto-detection)\nLD      HL, 0                   ; Use current active Layer 2\nLD      A, $C0                  ; Red color\nLD      C, SCREEN_LAYER2_AUTO_DMA\nCALL    Screen_FullReset_Unified\n\n; Manual DMA Layer 2 clear for maximum control\nLD      HL, $4000               ; Specific Layer 2 address\nLD      A, $38                  ; Orange color\nLD      C, SCREEN_LAYER2_MANUAL_DMA_256by192\nCALL    Screen_FullReset_Unified\n\n; Layer 2 double buffering setup\n; Clear back buffer\nLD      HL, $8000               ; Back buffer Layer 2 address\nLD      A, $00                  ; Black background\nLD      C, SCREEN_LAYER2_MANUAL_DMA_256by192\nCALL    Screen_FullReset_Unified\n\n; ... render graphics to back buffer ...\n\n; Switch Layer 2 to display back buffer\nLD      BC, LAYER2_REGISTER_SELECT_PORT\nLD      A, LAYER2_ADDRESS_REGISTER\nOUT     (C), A\nLD      BC, LAYER2_REGISTER_DATA_PORT\nLD      A, $20                  ; Point to back buffer bank\nOUT     (C), A\n\n; Automatic Layer 2 mode detection and clearing\nLD      HL, 0                   ; Use active Layer 2\nLD      A, $F0                  ; Bright magenta\nLD      C, SCREEN_LAYER2_AUTO_ACTIVE\nCALL    Screen_FullReset_Unified\n\n; Performance comparison - clear traditional screen vs Layer 2\n; Traditional screen (fastest software method)\nLD      HL, SCREEN_PIXEL_BASE   ; Traditional screen\nLD      A, $07                  ; White on black\nLD      C, SCREEN_ALLPUSH       ; 27.1 FPS\nCALL    Screen_FullReset_Unified\n\n; Layer 2 screen (DMA method)\nLD      HL, 0                   ; Active Layer 2\nLD      A, $FF                  ; White\nLD      C, SCREEN_LAYER2_AUTO_DMA ; 8,700+ FPS\nCALL    Screen_FullReset_Unified\n\n; Multi-resolution Layer 2 game setup\n; Check Layer 2 capabilities and set optimal mode\nCALL    Layer2_GetActiveMode    ; Returns mode in A\nCP      2                       ; Check if 640x256 mode\nJR      Z, Setup640x256\nCP      1                       ; Check if 320x256 mode  \nJR      Z, Setup320x256\n; Default to 256x192\nLD      C, SCREEN_LAYER2_AUTO_DMA\nJR      SetupComplete\n\nSetup640x256:\nLD      C, SCREEN_LAYER2_MANUAL_DMA_640by256\nJR      SetupComplete\n\nSetup320x256:\nLD      C, SCREEN_LAYER2_MANUAL_DMA_320by256\n\nSetupComplete:\nLD      HL, 0                   ; Use active Layer 2\nLD      A, $00                  ; Black background\nCALL    Screen_FullReset_Unified\n```\n\n### Screen Copying Examples\n\n```asm\n; Basic screen copy from buffer to display\nLD      HL, BackBuffer          ; Source address\nLD      DE, 0                   ; Destination (0 = default screen)\nLD      BC, 6912                ; Full screen size\nLD      C, SCREEN_COPY_COMPACT  ; Standard LDIR method\nCALL    Screen_FullCopy_Unified\n\n; High-performance copy using stack optimization\nLD      HL, OffScreenBuffer     ; Source address\nLD      DE, SCREEN_PIXEL_BASE   ; Destination address\nLD      BC, 6144                ; Pixel area only\nLD      C, SCREEN_COPY_ALLPUSH  ; Fastest software method (27.9 FPS)\nCALL    Screen_PixelCopy_Unified\n\n; Next-only Z80N optimized copy\nLD      HL, BackBuffer          ; Source address\nLD      DE, 0                   ; Destination (default screen)\nLD      BC, 6912                ; Full screen\nLD      C, SCREEN_COPY_Z80N_COMPACT ; Z80N LDIRX (31.6 FPS)\nCALL    Screen_FullCopy_Unified\n\n; Ultra-fast DMA copy (Next only)\nLD      HL, BackBuffer          ; Source address\nLD      DE, SCREEN_PIXEL_BASE   ; Destination address\nLD      BC, 6912                ; Full screen\nLD      C, SCREEN_COPY_DMA_BURST ; DMA burst mode (12,500+ FPS)\nCALL    Screen_FullCopy_Unified\n\n; Attribute-only copy for color changes\nLD      HL, AttributeBuffer     ; Source attributes\nLD      DE, SCREEN_ATTR_BASE    ; Destination attributes\nLD      BC, 768                 ; Attribute area size\nLD      C, SCREEN_COPY_4PUSH    ; Fast stack method\nCALL    Screen_AttrCopy_Unified\n\n; Automatic hardware detection and optimal performance\nCALL    CheckOnZ80N             ; Check for Z80N\nJR      Z, StandardHardware     ; Jump if standard Z80\n\nCALL    CheckDMAAvailable       ; Check for DMA\nJR      Z, Z80NOnly            ; Jump if no DMA\n\n; Use DMA for maximum performance\nLD      C, SCREEN_COPY_DMA_BURST\nJR      CopyScreen\n\nZ80NOnly:\nLD      C, SCREEN_COPY_Z80N_COMPACT\nJR      CopyScreen\n\nStandardHardware:\nLD      C, SCREEN_COPY_ALLPUSH\n\nCopyScreen:\nLD      HL, BackBuffer          ; Source\nLD      DE, 0                   ; Destination\nCALL    Screen_FullCopy_Unified\n\n; Double buffering with optimal performance\n; Render to back buffer\nLD      HL, 0                   ; Clear back buffer first\nLD      A, $07                  ; White on black\nLD      C, SCREEN_DMA_BURST     ; Ultra-fast clearing\nCALL    Screen_FullReset_Unified\n\n; ... render graphics to back buffer ...\n\n; Copy back buffer to display\nLD      HL, BackBuffer          ; Source\nLD      DE, 0                   ; Destination (screen)\nLD      C, SCREEN_COPY_DMA_BURST ; Ultra-fast copy\nCALL    Screen_FullCopy_Unified\n\n; High-performance Layer 2 auto detect active layer 2 address providing just the back buffer address to copy from.\nLD      HL, BackBuffer          ; Source: off-screen Layer 2 buffer\nLD      DE, ActiveLayer2        ; Destination: active Layer 2 display\nLD      C, SCREEN_COPY_LAYER2_AUTO_DMA ; Auto-detect mode, use DMA\nCALL    Screen_FullCopy_Unified\n\n; Manual Layer 2 copying for specific resolutions\nLD      HL, Layer2Buffer        ; Source buffer\nLD      DE, $4000               ; Layer 2 display address\nLD      C, SCREEN_COPY_LAYER2_MANUAL_DMA_256by192 ; Fastest for 256×192\nCALL    Screen_FullCopy_Unified\n```\n\n### Plus 3 Hardware Double Buffering Examples\n\n```asm\n; Initialize Plus 3 double buffering (call once at start)\n    CALL    PLUS3_SETUP_DOUBLE_BUFFER       ; Bank 5 visible, Bank 7 at $C000\n; Now ready for hardware double buffering\n\nGameLoop:\n    ; Ensure off-screen buffer is ready for drawing\n    CALL    PLUS3_SET_OFFSCREEN_BUFFER   ; HL = $C000, A = off-screen bank number\n    ; Smart routine - only performs banking if needed (~35-55 T-states)\n    \n    ; Clear off-screen buffer (always at $C000)\n    LD      A, %00000111                 ; White on black attributes\n    LD      C, SCREEN_DMA_BURST          ; Use fastest clearing method\n    ; HL already contains $C000 from SET_OFFSCREEN_BUFFER\n    CALL    Screen_FullReset_Unified\n    \n    ; Draw your frame to $C000 (off-screen buffer)\n    ; ... all rendering code draws to $C000 ...\n    ; ... sprites, backgrounds, text, etc. ...\n    \n    ; Instant buffer swap (~95 T-states total)\n    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE   ; Hardware switch + HALT sync\n    ; Previous off-screen buffer now visible, previous visible now off-screen\n    \n    ; Check for game exit condition\n    CALL    ScanKeyboard\n    BIT     0, A                         ; Check for SPACE\n    JR      Z, GameLoop                  ; Continue if not pressed\n    RET                                  ; Exit game\n\n; Advanced Plus 3 double buffering with performance optimization\nOptimizedGameLoop:\n    ; Get off-screen buffer (smart banking)\n    CALL    PLUS3_SET_OFFSCREEN_BUFFER   ; HL = $C000, A = bank number\n    \n    ; Ultra-fast screen clearing using DMA\n    LD      A, $00                       ; Black screen\n    LD      C, SCREEN_DMA_BURST          ; Maximum performance\n    CALL    Screen_FullReset_Unified     ; Clear in ~235 T-states\n    \n    ; High-performance rendering to off-screen buffer\n    ; All drawing operations target $C000\n    \n    ; Draw sprites using fast multiplication\n    LD      A, SpriteX                   ; Sprite X position\n    LD      B, 32                        ; Screen width in characters\n    LD      C, PERFORMANCE_NEXT_COMPACT  ; Z80N MUL for speed\n    CALL    Multiply8x8_Unified          ; HL = screen offset\n    LD      BC, $C000                    ; Off-screen buffer base\n    ADD     HL, BC                       ; HL = sprite screen address\n    ; ... draw sprite at HL ...\n    \n    ; Generate random enemy positions using Z80N\n    LD      A, 255                       ; Screen width range\n    LD      C, PERFORMANCE_Z80N_RANDOM_XORSHIFT ; Fastest random\n    CALL    Random8_Unified_Next         ; A = random X position\n    ; ... use A for enemy placement ...\n    \n    ; Hardware buffer swap (instant display update)\n    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE   ; ~95 T-states\n    \n    ; Game logic during next frame preparation\n    CALL    UpdatePlayerPosition\n    CALL    MoveEnemies\n    CALL    CheckCollisions\n    \n    JR      OptimizedGameLoop\n\n; Multi-platform double buffering selection\n; Automatically choose best method based on available hardware\nInitializeDoubleBuffering:\n    ; Check if we're on +3 or Next with banking support\n    LD      A, $FF                       ; Test value\n    LD      ($C000), A                   ; Try to write to potential RAM\n    LD      A, ($C000)                   ; Read back\n    CP      $FF                          ; Check if write worked\n    JR      NZ, UseSoftwareBuffering     ; Use software if no banking\n    \n    ; Hardware double buffering available\n    CALL    PLUS3_SETUP_DOUBLE_BUFFER    ; Initialize hardware method\n    LD      A, 1                         ; Flag for hardware buffering\n    LD      (BufferingMode), A\n    RET\n\nUseSoftwareBuffering:\n    ; Set up software double buffering\n    LD      HL, BackBuffer               ; Allocate back buffer\n    LD      A, 0                         ; Flag for software buffering\n    LD      (BufferingMode), A\n    RET\n\n; Unified buffer swap routine (hardware or software)\nSwapBuffers:\n    LD      A, (BufferingMode)\n    OR      A                            ; Check buffering mode\n    JR      Z, SoftwareSwap              ; Jump if software mode\n    \n    ; Hardware buffer swap\n    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE   ; ~95 T-states\n    RET\n\nSoftwareSwap:\n    ; Software buffer copy\n    LD      HL, BackBuffer               ; Source: back buffer\n    LD      DE, 0                        ; Destination: screen\n    LD      C, SCREEN_COPY_DMA_BURST     ; Use fastest available copy\n    CALL    Screen_FullCopy_Unified      ; Copy buffer to screen\n    RET\n\n; Performance comparison demonstration\nPerformanceTest:\n    ; Test software copying speed\n    LD      B, 100                       ; 100 iterations\n    CALL    StartTimer\nSoftwareLoop:\n    LD      HL, TestBuffer\n    LD      DE, 0\n    LD      C, SCREEN_COPY_ALLPUSH\n    CALL    Screen_FullCopy_Unified\n    DJNZ    SoftwareLoop\n    CALL    StopTimer                    ; Record software time\n    \n    ; Test hardware switching speed  \n    CALL    PLUS3_SETUP_DOUBLE_BUFFER\n    LD      B, 100                       ; 100 iterations\n    CALL    StartTimer\nHardwareLoop:\n    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE\n    DJNZ    HardwareLoop\n    CALL    StopTimer                    ; Record hardware time\n    \n    ; Hardware method will be ~1500× faster!\n    RET\n\n; Variables\nBufferingMode:          DB      0       ; 0=software, 1=hardware\nBackBuffer:             DS      6912    ; Software back buffer\nTestBuffer:             DS      6912    ; Test data buffer\n```\n\n#### Plus 3 Double Buffering Best Practices\n\n```asm\n; 1. Always initialize double buffering before use\n    CALL    PLUS3_SETUP_DOUBLE_BUFFER       ; One-time setup\n\n; 2. Always call SET_OFFSCREEN_BUFFER before drawing\nMainLoop:\n    CALL    PLUS3_SET_OFFSCREEN_BUFFER   ; Ensures correct bank at $C000\n    ; Now safe to draw to $C000\n    \n; 3. Use HALT synchronization for smooth animation\n    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE   ; Includes HALT for sync\n    JR      MainLoop\n\n; 4. Combine with DMA for maximum performance\n    LD      A, ClearValue\n    LD      C, SCREEN_DMA_BURST          ; Ultra-fast clearing\n    ; HL already set to $C000 by SET_OFFSCREEN_BUFFER\n    CALL    Screen_FullReset_Unified\n\n; 5. Leverage always-$C000 addressing for consistent code\nDrawSprite:\n    ; Off-screen buffer is always at $C000 after SET_OFFSCREEN_BUFFER\n    LD      HL, $C000                    ; Base address\n    ; ... add sprite offset ...\n    ; ... draw sprite data ...\n    RET\n\n; 6. Error handling for unsupported hardware\nInitGame:\n    CALL    PLUS3_SETUP_DOUBLE_BUFFER\n    ; Check if setup succeeded by testing banking\n    LD      A, $AA\n    LD      ($C000), A\n    LD      A, ($C000)\n    CP      $AA\n    JR      Z, BufferingOK\n    \n    ; Fall back to software rendering\n    JP      InitSoftwareMode\n    \nBufferingOK:\n    ; Continue with hardware double buffering\n    JP      InitHardwareMode\n```\n\n### Layer 2 Graphics Examples\n\n```asm\n; Detect if Layer 2 is available and active\nCALL    CheckForActiveLayer2    ; Check Layer 2 availability\nJR      Z, NoLayer2            ; Jump if not available\n\n; Get Layer 2 configuration\nCALL    GetLayer2FullInfo      ; Get complete Layer 2 info\n; Layer2Resolution now contains: 0=256×192, 1=320×256, 2=640×256\n; Layer2Width/Height contain pixel dimensions\n; Layer2Bpp contains color depth (4 or 8 bits per pixel)\n\n; Clear Layer 2 screen with DMA acceleration\nLD      A, $FF                 ; Fill color (white in 8bpp mode)\nLD      C, SCREEN_LAYER2_DMA_BURST ; Maximum performance\nCALL    GetActiveLayer2Addr    ; Get Layer 2 address in HL\nCALL    Screen_Layer2Clear_Unified ; Clear Layer 2 screen\n\n; Double buffering example\nCALL    GetActiveLayer2Addr    ; Get current display buffer\nLD      (DisplayBuffer), HL    ; Store display buffer\nLD      HL, BackBuffer         ; Set back buffer address\n; ... render to back buffer ...\n; ... swap buffers when ready ...\n\nNoLayer2:\n; Fall back to standard ULA screen operations\nLD      HL, 0                  ; Use standard screen\nLD      A, $07                 ; White on black\nLD      C, SCREEN_DMA_BURST    ; Use DMA if available\nCALL    Screen_FullReset_Unified\n```\n\n### DMA Memory Operations\n\n```asm\n; Fill large memory area using DMA\nLD      HL, $8000       ; Destination address\nLD      A, $FF          ; Fill pattern\nLD      BC, 16384       ; Size (16KB)\nCALL    DMA_MemoryFill  ; ~240 T-states CPU + hardware time\n\n; Ultra-fast burst fill\nLD      HL, $C000       ; Destination\nLD      A, $00          ; Clear pattern\nLD      BC, 8192        ; Size (8KB)\nLD      D, $FF          ; Burst mode flag\nCALL    DMA_BurstFill   ; ~235 T-states CPU + hardware time\n\n; Copy large memory blocks using DMA\nLD      HL, SourceData          ; Source address\nLD      DE, DestBuffer          ; Destination address\nLD      BC, 16384               ; Size (16KB)\nCALL    DMA_MemoryCopy          ; ~300 T-states CPU + hardware time\n\n; Ultra-fast burst copy\nLD      HL, LargeBuffer         ; Source address\nLD      DE, TargetLocation      ; Destination address\nLD      BC, 32768               ; Size (32KB)\nCALL    DMA_MemoryCopy_Burst    ; ~270 T-states CPU + hardware time\n\n; Screen copying with DMA utilities\nLD      HL, OffScreenBuffer     ; Source screen\nLD      DE, SCREEN_PIXEL_BASE   ; Destination screen\nLD      BC, 6912                ; Full screen size\nCALL    DMA_MemoryCopy_Burst    ; Fastest possible screen copy\n\n; Memory preparation for double buffering\nLD      HL, Buffer1             ; Source buffer\nLD      DE, Buffer2             ; Destination buffer  \nLD      BC, 6912                ; Screen size\nCALL    DMA_MemoryCopy          ; Prepare buffer swap\n```\n\n### Hardware Detection Utilities\n\n```asm\n; Check what hardware is available and select optimal routines\nCALL    CheckOnZ80N     ; Returns NZ if Z80N available\nJR      Z, StandardZ80  ; Jump if not Z80N\n\n; Z80N detected - check for DMA\nCALL    CheckDMAAvailable ; Returns NZ if DMA available\nJR      Z, Z80NOnly     ; Jump if no DMA\n\n; Full Next hardware available\nLD      C, SCREEN_DMA_BURST ; Use maximum performance\nJR      PerformOperation\n\nZ80NOnly:\nLD      C, SCREEN_Z80N_COMPACT ; Use Z80N optimizations\nJR      PerformOperation\n\nStandardZ80:\nLD      C, SCREEN_ALLPUSH\n\nPerformOperation:\nLD      A, %00000111    ; Screen attribute\nLD      HL, 0           ; Default screen\nCALL    Screen_FullReset_Unified\n```\n\n## ⚡ **Z80N Performance Comparison**\n\n### 8×8 Multiplication Performance\n\n| Performance Level | T-States | Platform | Description |\n|------------------|----------|----------|-------------|\n| **PERFORMANCE_COMPACT** | 35-75 | All | Variable timing, compact code |\n| **PERFORMANCE_BALANCED** | ~160 | All | Fixed timing, predictable |\n| **PERFORMANCE_MAXIMUM** | ~120 | All | Optimized for speed |\n| **PERFORMANCE_NEXT_COMPACT** | ~14 | Next | Z80N MUL instruction |\n| **PERFORMANCE_NEXT_BALANCED** | ~29 | Next | Z80N MUL + overflow check |\n| **PERFORMANCE_NEXT_MAXIMUM** | ~20 | Next | Z80N MUL + special cases |\n\n**Performance Improvement**: Up to **85% faster** on Spectrum Next!\n\n### 16×8 Multiplication Performance\n\n| Performance Level | T-States | Platform | Description |\n|------------------|----------|----------|-------------|\n| **PERFORMANCE_COMPACT** | 45-380 | All | Variable timing, compact code |\n| **PERFORMANCE_BALANCED** | ~180 | All | Fixed timing, predictable |\n| **PERFORMANCE_MAXIMUM** | ~140 | All | Optimized for speed |\n| **PERFORMANCE_NEXT_COMPACT** | ~97 | Next | Z80N MUL instruction |\n| **PERFORMANCE_NEXT_BALANCED** | ~97 | Next | Z80N MUL + same algorithm |\n| **PERFORMANCE_NEXT_MAXIMUM** | ~97 | Next | Z80N MUL + same algorithm |\n\n**Performance Improvement**: Up to **75% faster** on Spectrum Next for balanced/maximum modes!\n\n### 8÷8 Division Performance\n\n| Performance Level | T-States | Platform | Description |\n|------------------|----------|----------|-------------|\n| **PERFORMANCE_COMPACT** | 25-1950 | All | Variable timing, compact subtraction |\n| **PERFORMANCE_BALANCED** | 30-1975 | All | Similar to compact, different registers |\n| **PERFORMANCE_MAXIMUM** | 40-1000 | All | Optimized with 2× acceleration |\n| **PERFORMANCE_NEXT_COMPACT** | 40-400 | Next | Z80N MUL hybrid method |\n| **PERFORMANCE_NEXT_BALANCED** | ~175 | Next | Z80N MUL 8-bit reciprocal table |\n| **PERFORMANCE_NEXT_MAXIMUM** | ~218 | Next | Z80N MUL 16-bit reciprocal table |\n\n**Performance Improvement**: Up to **90% faster** on Spectrum Next!  \n**✅ Accuracy Note**: NEXT_COMPACT and NEXT_MAXIMUM provide exact mathematical results. NEXT_BALANCED uses 8-bit reciprocal for speed with minor accuracy trade-offs only for edge cases. NEXT_MAXIMUM uses 16-bit reciprocal for maximum precision. All algorithms pass comprehensive test validation.\n\n### 16÷8 Division Performance\n\n| Performance Level | T-States | Platform | Description |\n|------------------|----------|----------|-------------|\n| **PERFORMANCE_COMPACT** | 45-1300 | All | Variable subtraction, worst case 65535÷1 |\n| **PERFORMANCE_BALANCED** | 220-280 | All | Fixed binary long division, consistent timing |\n| **PERFORMANCE_MAXIMUM** | 180-420 | All | Optimized binary division with early exits |\n| **PERFORMANCE_NEXT_COMPACT** | 118-500 | Next | Z80N hybrid: 8×8 for H=0, traditional for larger |\n| **PERFORMANCE_NEXT_BALANCED** | 118-500 | Next | Uses 8-bit reciprocal table, some precision tradeoff |\n| **PERFORMANCE_NEXT_MAXIMUM** | 107-520 | Next | Use 16-bit reciprocal table, high precision |\n\n**Algorithm Selection (NEXT_COMPACT/BALANCED)**:\n- **H=0** (dividend ≤255): Uses Z80N 8×8 hybrid division\n- **H=1-15** (256-4095): Uses traditional balanced division  \n- **H≥16** (4096+): Uses traditional maximum division\n\n**Algorithm Selection (NEXT_MAXIMUM)**:\n- **H=0 and L\u003cB**: Direct return with quotient=0, remainder=L\n- **H=0 and L≥B**: Uses Z80N 8×8 16-bit reciprocal division for maximum precision\n- **H≠0**: Uses traditional maximum division algorithm\n\n**Performance Improvement**: Up to **65% faster** for small dividends on Spectrum Next!  \n\n### Screen Clearing Performance\n\n| Performance Level | Full Reset T-States | Platform | Improvement |\n|------------------|-------------------|----------|-------------|\n| **SCREEN_COMPACT** | 145,265 | All | Baseline |\n| **SCREEN_Z80N_COMPACT** | 96,700 | Next | 33% faster |\n| **SCREEN_ALLPUSH** | 40,344 | All | 72% faster |\n| **SCREEN_DMA_FILL** | 400 | Next | 99.7% faster |\n| **SCREEN_DMA_BURST** | 260 | Next | 99.8% faster |\n\n**DMA Performance Notes**:\n- T-States shown are CPU overhead only\n- Actual memory transfer happens in hardware parallel to CPU\n- DMA provides dramatic speed improvements for large memory operations\n- Automatic fallback to standard routines if DMA unavailable\n\n### 16-bit Operations\n\n```asm\n; 16×8 Multiplication (Standard Z80)\nLD      HL, 1000        ; 16-bit multiplicand\nLD      B, 50           ; 8-bit multiplier\nLD      C, PERFORMANCE_MAXIMUM\nCALL    Multiply16x8_Unified\n; Result in DE:HL = 50000\n\n; 16×8 Multiplication (Next Z80N - Ultra Fast!)\nLD      HL, 1000        ; 16-bit multiplicand  \nLD      B, 50           ; 8-bit multiplier\nLD      C, PERFORMANCE_NEXT_COMPACT   ; Uses Z80N MUL instruction\nCALL    Multiply16x8_Unified\n; Result in DE:HL = 50000 (75% faster!)\n\n; 16÷8 Division (Standard Z80)\nLD      HL, 1234        ; 16-bit dividend\nLD      B, 10           ; 8-bit divisor\nLD      C, PERFORMANCE_BALANCED\nCALL    Divide16x8_Unified\n; Quotient in HL = 123, remainder in A = 4\n\n; 16÷8 Division (Next Z80N - Hybrid Method)\nLD      HL, 5000        ; 16-bit dividend\nLD      B, 25           ; 8-bit divisor\nLD      C, PERFORMANCE_NEXT_COMPACT   ; Uses Z80N hybrid algorithm\nCALL    Divide16x8_Unified\n; Quotient in HL = 200, remainder in A = 0 (65% faster for H≥16!)\n```\n\n## 🔧 **API Reference**\n\n### Mathematical Operations\n\n#### Multiplication\n- `Multiply8x8_Unified` - 8×8 bit unsigned multiplication\n  - Standard Z80: COMPACT/BALANCED/MAXIMUM (35-160 T-states)\n  - Next Z80N: NEXT_COMPACT/NEXT_BALANCED/NEXT_MAXIMUM (10-29 T-states)\n- `Multiply16x8_Unified` - 16×8 bit unsigned multiplication\n  - Standard Z80: COMPACT/BALANCED/MAXIMUM (45-380 T-states)\n  - Next Z80N: NEXT_COMPACT/NEXT_BALANCED/NEXT_MAXIMUM (97 T-states)\n\n**Input**: A/HL = multiplicand, B = multiplier, C = performance level  \n**Output**: HL = result (8×8), DE:HL = result (16×8)  \n**Z80N Performance**: Up to 85% faster (8×8) and 75% faster (16×8) on Spectrum Next\n\n#### Division\n- `Divide8x8_Unified` - 8÷8 bit unsigned division\n  - Standard Z80: COMPACT/BALANCED/MAXIMUM (25-1975 T-states)\n  - Next Z80N: NEXT_COMPACT (40-400 T-states hybrid - subtraction \u003c128, 8-bit reciprocal ≥128)\n  - Next Z80N: NEXT_BALANCED (~175 T-states 8-bit reciprocal table)\n  - Next Z80N: NEXT_MAXIMUM (~175 T-states currently fallback to 8-bit reciprocal)\n- `Divide16x8_Unified` - 16÷8 bit unsigned division\n  - Standard Z80: COMPACT/BALANCED/MAXIMUM (45-1300 T-states)\n  - Next Z80N: NEXT_COMPACT/NEXT_BALANCED (118-500 T-states hybrid method)\n  - Next Z80N: NEXT_MAXIMUM (107-520 T-states reciprocal with fallback)\n\n**Input**: A/HL = dividend, B = divisor, C = performance level  \n**Output**: A/HL = quotient, A/B = remainder  \n**Z80N Performance**: Up to 95% faster (8÷8), 65% faster (16÷8) on Spectrum Next  \n**✅ Accuracy Note**: All division algorithms pass comprehensive test validation. Reciprocal methods use optimized approximation with validated accuracy for typical use cases.\n\n### Random Number Generation\n\n#### 8-bit Random\n- `Random8_Unified_Seed` - Initialize 8-bit random seed and generate first value\n- `Random8_Unified_Next` - Generate subsequent 8-bit random numbers\n\n**Input**: A = upper limit (inclusive), B = seed (seeding only), C = algorithm selection  \n**Output**: A = random value in range [0, limit]\n\n#### 16-bit Random  \n- `Random16_Unified_Seed` - Initialize 16-bit random seed and generate first value\n- `Random16_Unified_Next` - Generate subsequent 16-bit random numbers\n\n**Input**: HL = upper limit (inclusive), BC = seed (seeding only), D = algorithm selection  \n**Output**: HL = random value in range [0, limit]\n\n### Screen Clearing\n\n#### Traditional ZX Spectrum Screen Clearing\n- `Screen_FullReset_Unified` - Clear complete screen (pixels + attributes)\n- `Screen_PixelReset_Unified` - Clear pixels only, preserve attributes\n- `Screen_AttrReset_Unified` - Clear attributes only, preserve pixels\n\n**Input**: \n- HL = screen address (0 = use default screen address)\n- A = fill value (pixel value or attribute value)\n- C = performance level (SCREEN_COMPACT through SCREEN_DMA_BURST)\n\n#### Spectrum Next Layer 2 Screen Clearing\n- `Screen_FullReset_Unified` - Clear Layer 2 screen (supports all Layer 2 modes)\n\n**Input**:\n- HL = Layer 2 address (0 = use current active Layer 2)\n- A = color value (8-bit palette index)\n- C = Layer 2 performance level\n\n**Layer 2 Performance Levels**:\n- **Manual Modes**: Direct address and resolution specification\n  - **SCREEN_LAYER2_MANUAL_256by192**: 256×192 LDIRX clearing (~205,000 T-states)\n  - **SCREEN_LAYER2_MANUAL_320by256**: 320×256 LDIRX clearing (~350,000 T-states)\n  - **SCREEN_LAYER2_MANUAL_640by256**: 640×256 LDIRX clearing (~700,000 T-states)\n  - **SCREEN_LAYER2_MANUAL_DMA_256by192**: 256×192 DMA clearing (~280 T-states)\n  - **SCREEN_LAYER2_MANUAL_DMA_320by256**: 320×256 DMA clearing (~320 T-states)\n  - **SCREEN_LAYER2_MANUAL_DMA_640by256**: 640×256 DMA clearing (~400 T-states)\n- **Automatic Modes**: Hardware detection with optimal selection\n  - **SCREEN_LAYER2_AUTO_ACTIVE**: Auto-detect mode, use LDIRX (variable T-states)\n  - **SCREEN_LAYER2_AUTO_DMA**: Auto-detect mode, use DMA burst (~280-400 T-states)\n\n**Output**: Screen area cleared with specified color value\n\n#### Legacy Individual Performance Methods (Deprecated)\n- `Screen_FullReset_CompactLDIR` through `Screen_FullReset_ALLPUSH` - Use unified versions instead\n#### Unified Screen Clearing with Flexible Addressing\n- `Screen_FullReset_Unified` - Clear pixels and set attributes\n- `Screen_ClearPixel_Unified` - Clear pixels only, preserve attributes  \n- `Screen_ClearAttr_Unified` - Set attributes only, preserve pixels\n\n**Input**: \n- A = attribute value (for full reset and attribute operations)\n- C = performance level\n- HL = screen base address (0 = use default 16384, other values = custom address)\n\n**Output**: Screen cleared according to specified operation\n\n**Enhanced Features**:\n- **Flexible Addressing**: Support for any memory location as screen buffer\n- **Off-Screen Rendering**: Full support for secondary screen buffers\n- **Double Buffering**: Enable smooth animations with back-buffer rendering\n- **Memory Conservation**: Efficient in-memory screen composition\n\n**Performance Levels**:\n- **SCREEN_COMPACT**: Standard LDIR operation (baseline, ~145,265 T-states)\n- **SCREEN_Z80N_COMPACT**: Z80N LDIRX optimization (33% faster, ~96,700 T-states)\n- **SCREEN_1PUSH to SCREEN_ALLPUSH**: Stack-based optimizations (37-72% faster)\n- **SCREEN_DMA_FILL**: DMA memory fill (99.7% faster, ~400 T-states CPU)\n- **SCREEN_DMA_BURST**: DMA burst mode (99.8% faster, ~260 T-states CPU)\n\n### Screen Copying\n\n#### Unified Screen Copying with Maximum Performance\n- `Screen_FullCopy_Unified` - Copy complete screen (pixels + attributes)\n- `Screen_PixelCopy_Unified` - Copy pixels only, preserve destination attributes\n- `Screen_AttrCopy_Unified` - Copy attributes only, preserve destination pixels\n\n**Input**: \n- HL = source address\n- DE = destination address (0 = use default screen address)\n- BC = byte count (6912 for full screen, 6144 for pixels, 768 for attributes)\n- C = performance level\n\n**Output**: Memory copied according to specified operation\n\n**Performance Levels**:\n- **SCREEN_COPY_COMPACT**: Standard LDIR operation (145,152 T-states, 24.1 FPS)\n- **SCREEN_COPY_1PUSH to SCREEN_COPY_ALLPUSH**: Stack optimizations (173,278 to 124,908 T-states, 20.2 to 27.9 FPS)\n- **SCREEN_COPY_Z80N_COMPACT**: Z80N LDIRX optimization (110,612 T-states, 31.6 FPS)\n- **SCREEN_COPY_DMA_FILL**: DMA memory copy (300 T-states, 12,500+ FPS)\n- **SCREEN_COPY_DMA_BURST**: DMA burst copy (270 T-states, 12,500+ FPS)\n\n### Input Utilities\n- `ScanKeyboard` - Comprehensive keyboard scanning across all rows\n- `WaitForKey` - Wait for any key press with optional timeout\n- `GetKeyPress` - Get current key state without waiting\n\n**Input**: Various parameters depending on function\n**Output**: Key codes or status flags\n**Performance**: Optimized for minimal input lag\n\n### Text Utilities  \n- `DisplayText` - Render text string to screen\n- `DisplayTextAt` - Render text at specific screen coordinates\n- `GetTextWidth` - Calculate text width for positioning\n- `ClearTextArea` - Clear specific text area\n\n**Input**: Text strings, coordinates, formatting options\n**Output**: Text rendered to screen\n**Performance**: Fast text rendering for real-time display\n\n### Scoring Utilities\n- `ConvertScoreToString` - Convert 16-bit score to display string\n- `DisplayScore` - Render score to screen with formatting\n- `FormatScore` - Apply padding and alignment to score string\n\n**Input**: Score values, formatting options, display coordinates\n**Output**: Formatted score displayed on screen\n**Performance**: Optimized for frequent score updates\n\n### Hardware Detection\n\n#### Utility Functions\n- `CheckOnZ80N` - Detect Z80N processor availability\n- `CheckDMAAvailable` - Detect DMA controller availability\n\n**Input**: None  \n**Output**: Z flag set if feature not available, NZ if available  \n**Performance**: 58-81 T-states depending on feature and availability\n\n### DMA Memory Operations (Spectrum Next Only)\n\n#### Memory Copy Operations\n- `DMA_MemoryCopy` - DMA-accelerated memory copying\n- `DMA_MemoryCopy_Burst` - DMA burst mode memory copying\n\n**Input**: HL = source address, DE = destination address, BC = byte count  \n**Output**: Memory copied via DMA controller  \n**Performance**: ~270-300 T-states CPU overhead + parallel hardware transfer\n\n#### Memory Fill Operations\n- `DMA_MemoryFill` - DMA-accelerated memory fill\n- `DMA_BurstFill` - DMA burst mode memory fill\n\n**Input**: HL = destination address, A = fill byte, BC = byte count, D = burst mode (BurstFill only)  \n**Output**: Memory filled via DMA controller  \n**Performance**: ~235-260 T-states CPU overhead + parallel hardware transfer\n\n### Algorithm Constants\n\n```asm\n; Performance Levels (Standard Z80)\nPERFORMANCE_COMPACT                         EQU 0\nPERFORMANCE_BALANCED                        EQU 1  \nPERFORMANCE_MAXIMUM                         EQU 2\n\n; Performance Levels (Next Z80N)\nPERFORMANCE_NEXT_COMPACT                    EQU 3\nPERFORMANCE_NEXT_BALANCED                   EQU 4\nPERFORMANCE_NEXT_MAXIMUM                    EQU 5\n\n; Screen Performance Levels\nSCREEN_COMPACT                              EQU 0    ; Standard LDIR operation\nSCREEN_1PUSH                                EQU 1    ; 2 bytes per iteration\nSCREEN_2PUSH                                EQU 2    ; 4 bytes per iteration\nSCREEN_4PUSH                                EQU 3    ; 8 bytes per iteration\nSCREEN_8PUSH                                EQU 4    ; 16 bytes per iteration\nSCREEN_ALLPUSH                              EQU 5    ; 256 bytes per iteration\nSCREEN_Z80N_COMPACT                         EQU 6    ; Z80N LDIRX optimization\nSCREEN_DMA_FILL                             EQU 7    ; DMA memory fill\nSCREEN_DMA_BURST                            EQU 8    ; DMA burst fill\nSCREEN_LAYER2_MANUAL_256by192               EQU 9    ; Manual Layer 2 256x192 LDIRX\nSCREEN_LAYER2_MANUAL_320by256               EQU 10   ; Manual Layer 2 320x256 LDIRX\nSCREEN_LAYER2_MANUAL_640by256               EQU 11   ; Manual Layer 2 640x256 LDIRX\nSCREEN_LAYER2_MANUAL_DMA_256by192           EQU 12   ; Manual Layer 2 256x192 DMA\nSCREEN_LAYER2_MANUAL_DMA_320by256           EQU 13   ; Manual Layer 2 320x256 DMA\nSCREEN_LAYER2_MANUAL_DMA_640by256           EQU 14   ; Manual Layer 2 640x256 DMA\nSCREEN_LAYER2_AUTO_ACTIVE                   EQU 15   ; Auto Layer 2 detection LDIRX\nSCREEN_LAYER2_AUTO_DMA                      EQU 16   ; Auto Layer 2 detection DMA\n\n; Layer 2 Display Constants\nLAYER2_REGISTER_DATA_PORT                   EQU $243B ; Next register data port\nLAYER2_REGISTER_SELECT_PORT                 EQU $253B ; Next register select port\nLAYER2_ADDRESS_REGISTER                     EQU $12   ; Layer 2 address register\nLAYER2_CONTROL_REGISTER                     EQU $15   ; Layer 2 control register\nLAYER2_BYTES_256by192                       EQU $C000 ; 48KB (256x192 mode)\nLAYER2_BYTES_320by256_HALF                  EQU $A000 ; 40KB (half of 320x256)\nLAYER2_BYTES_640by256_QTR                   EQU $A000 ; 40KB (quarter of 640x256)\n\n; Screen Copy Performance Levels\nSCREEN_COPY_COMPACT                         EQU 0    ; Standard LDIR operation\nSCREEN_COPY_1PUSH                           EQU 1    ; 2 bytes per iteration\nSCREEN_COPY_2PUSH                           EQU 2    ; 4 bytes per iteration\nSCREEN_COPY_4PUSH                           EQU 3    ; 8 bytes per iteration  \nSCREEN_COPY_8PUSH                           EQU 4    ; 16 bytes per iteration\nSCREEN_COPY_ALLPUSH                         EQU 5    ; 256 bytes per iteration\nSCREEN_COPY_Z80N_COMPACT                    EQU 6    ; Z80N LDIRX optimization\nSCREEN_COPY_DMA_FILL                        EQU 7    ; DMA memory copy\nSCREEN_COPY_DMA_BURST                       EQU 8    ; DMA burst copy\nSCREEN_COPY_LAYER2_MANUAL_256by192          EQU 9    ; Manual Layer 2 256x192 LDIRX\nSCREEN_COPY_LAYER2_MANUAL_320by256          EQU 10   ; Manual Layer 2 320x256 LDIRX\nSCREEN_COPY_LAYER2_MANUAL_640by256          EQU 11   ; Manual Layer 2 640x256 LDIRX\nSCREEN_COPY_LAYER2_MANUAL_DMA_256by192      EQU 12   ; Manual Layer 2 256x192 DMA\nSCREEN_COPY_LAYER2_MANUAL_DMA_320by256      EQU 13   ; Manual Layer 2 320x256 DMA\nSCREEN_COPY_LAYER2_MANUAL_DMA_640by256      EQU 14   ; Manual Layer 2 640x256 DMA\nSCREEN_COPY_LAYER2_AUTO_ACTIVE              EQU 15   ; Use automatic active Layer 2 address and resolution detection and LDIRX to copy Layer 2 screen - Next only.\nSCREEN_COPY_LAYER2_AUTO_DMA                 EQU 16   ; Use automatic active Layer 2 address and resolution detection and DMA BURST to copy Layer 2 screen - Next only.\n\n; 8-bit Random Algorithms (Standard Z80)\nPERFORMANCE_STANDARD_RANDOM_LCG             EQU 0    ; Linear Congruential Generator\nPERFORMANCE_STANDARD_RANDOM_LFSR            EQU 1    ; Linear Feedback Shift Register\nPERFORMANCE_STANDARD_RANDOM_XORSHIFT        EQU 2    ; XorShift Algorithm\nPERFORMANCE_STANDARD_RANDOM_MIDDLESQUARE    EQU 3    ; Middle Square Method\n\n; 8-bit Random Algorithms (Next Z80N)\nPERFORMANCE_Z80N_RANDOM_LCG                 EQU 4    ; Z80N optimized LCG\nPERFORMANCE_Z80N_RANDOM_LFSR                EQU 5    ; Z80N optimized LFSR\nPERFORMANCE_Z80N_RANDOM_XORSHIFT            EQU 6    ; Z80N optimized XorShift\nPERFORMANCE_Z80N_RANDOM_MIDDLESQUARE        EQU 7    ; Z80N optimized Middle Square\n\n; 16-bit Random Algorithms (Standard Z80)\nPERFORMANCE_STANDARD_RANDOM16_LCG           EQU 0    ; 16-bit Linear Congruential Generator\nPERFORMANCE_STANDARD_RANDOM16_LFSR          EQU 1    ; 16-bit Linear Feedback Shift Register\nPERFORMANCE_STANDARD_RANDOM16_XORSHIFT      EQU 2    ; 16-bit XorShift Algorithm\nPERFORMANCE_STANDARD_RANDOM16_MIDDLESQUARE  EQU 3   ; 16-bit Middle Square Method\n\n; 16-bit Random Algorithms (Next Z80N)\nPERFORMANCE_Z80N_RANDOM16_LCG               EQU 4    ; Z80N optimized 16-bit LCG\nPERFORMANCE_Z80N_RANDOM16_LFSR              EQU 5    ; Z80N optimized 16-bit LFSR\nPERFORMANCE_Z80N_RANDOM16_XORSHIFT          EQU 6    ; Z80N optimized 16-bit XorShift\nPERFORMANCE_Z80N_RANDOM16_MIDDLESQUARE      EQU 7    ; Z80N optimized 16-bit Middle Square\n\n; DMA Constants (Next Only)\nDMA_RESET                                   EQU $C3    ; DMA reset command\nDMA_FILL                                    EQU $79    ; DMA fill transfer mode\nDMA_BURST_TRANSFER                          EQU $7F    ; DMA burst transfer mode\nDMA_BURST_CONTROL                           EQU $18    ; DMA burst control\nDMA_LOAD                                    EQU $CF    ; DMA load/start command\nDMA_BURST_LOAD                              EQU $DF    ; DMA burst load/start command\nZXN_DMA_PORT                                EQU $6B    ; Next DMA port\n```\n\n## 🧪 **Testing**\n\nNextLibrary includes comprehensive test suites:\n\n- **62 Test Cases** continually being expanded to cover more functionality\n- **Algorithm Validation** for all random number generators (8-bit and 16-bit)\n- **Performance Verification** across all performance levels\n- **Edge Case Testing** for boundary conditions\n- **Statistical Distribution Testing** for random number quality\n- **Seed Compatibility Testing** between standard and Z80N versions\n- **Screen Management Testing** for all performance levels and addressing modes\n- **DMA Validation** for hardware detection and memory operations\n- **Hardware Detection Testing** for Z80N and DMA capability verification\n\n### New Test Cases in v1.6\n\n- **Test 058**: Parameterized screen clearing with custom addresses\n- **Test 059**: DMA screen clearing validation (Next hardware only)\n\nRun tests using the included test framework:\n\n```asm\nINCLUDE \"Testing/TestCases.asm\"\n```\n\n## 🏗️ **Building**\n\n### Requirements\n- **sjasmplus** assembler\n- **ZX Spectrum Next** development environment (for Next-specific features)\n\n### Build Instructions\n\n```bash\n# Assemble the library\nsjasmplus --lst=NextLibrary.lst NextLibrary.asm\n\n# Build output will be generated in Output/nextlibrary.nex\n```\n\n### 🔧 **Modular Usage**\n\nNextLibrary is designed with a clear, modular structure that allows developers to extract only the routines they need:\n\n- **Selective Inclusion**: Each mathematical operation is self-contained in its own file\n- **Minimal Dependencies**: Most routines only depend on constants and variables\n- **Clean Code Structure**: Well-commented code makes extraction straightforward\n- **No Overhead**: Include only what you need for optimal memory usage\n\n**Example**: If you only need 8×8 multiplication and basic screen clearing, simply extract:\n- `Source/Multiply/Multiply8x8.asm` - The multiplication routines\n- `Source/Display/ScreenClearing.asm` - Screen clearing routines\n- Relevant constants from `Source/Constants.asm`\n- Any required variables from `Source/Variables.asm`\n\nThis modular approach ensures you can integrate specific functionality into your projects without including the entire library.\n\n\n## 📁 **Project Structure**\n\n```\nNextLibrary/\n├── Source/\n│   ├── Display/                # Screen, Layer 2, and text utilities\n│   ├── Divide/                 # Division routines  \n│   ├── DMA/                    # DMA support routines\n│   ├── Input/                  # Input handling routines\n│   ├── Multiply/               # Multiplication routines\n│   ├── Random/                 # Random number generation\n│   ├── Scoring/                # Score management\n│   ├── Testing/                # Test suites\n│   ├── Utility/                # Hardware detection utilities\n│   ├── ConstantsDisplay.asm    # Display and graphics constants\n│   ├── ConstantsDMA.asm        # DMA operation constants\n│   ├── ConstantsMaths.asm      # Mathematical constants\n│   ├── ConstantsRandom.asm     # Random generation constants \n│   ├── NextLibrary.asm         # Main library file\n│   ├── Variables.asm           # Global ariables\n│   ├── VariablesDisplay.asm    # Display-specific variables\n│   ├── VariablesDMA.asm        # DMA operation variables\n│   └── VariablesRandom.asm     # Random Generator variables\n├── Output/\n│   └── nextlibrary.nex     # Compiled library\n└── README.md              # This file\n```\n\n## 📄 **License**\n\n**NextLibrary is available for free use under the following terms:**\n\n### Free Use License\n\nThis software is provided **FREE OF CHARGE** for any purpose, including commercial and non-commercial use. You are granted the following rights:\n\n✅ **Use**: Use this library in any project without restriction  \n✅ **Modify**: Modify the source code to suit your needs  \n✅ **Distribute**: Redistribute original or modified versions  \n✅ **Commercial Use**: Use in commercial projects without royalties or fees  \n✅ **Private Use**: Use in private/personal projects  \n\n### Disclaimer of Warranty and Liability\n\n**THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.**\n\n**IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.**\n\nBy using this library, you acknowledge that:\n\n- **No Support Obligation**: The author has no obligation to provide support, updates, or bug fixes\n- **Use at Your Own Risk**: You assume all risks associated with using this software\n- **No Responsibility**: The author takes no responsibility for any consequences of using this library in your projects\n- **Your Responsibility**: You are responsible for testing and validating the library's suitability for your specific use case\n\n### Attribution (Optional)\n\nWhile not required, attribution is appreciated:\n```\n\"Uses NextLibrary - Z80 Assembly Utilities by [Author Name]\"\n```\n\n### Usage Information (Encouraged)\n\n**Help the community!** If you use NextLibrary in your project, please consider:\n\n📝 **Opening an issue or discussion** in this repository with:\n- **Project Name**: What you're building\n- **Routines Used**: Which NextLibrary functions you're using (e.g., \"8×8 multiplication (Z80N), XORShift random, DMA screen clearing\")\n- **Platform Target**: 48K, 128K, +2, +3, Next, or multi-platform\n- **Brief Description**: What your project does\n- **Optional Link**: Share your project if it's public!\n\nThis information helps:\n- **Other Developers**: See real-world usage examples and inspiration\n- **Library Development**: The author gaining an understanding of which routines are most valuable and most used\n- **Community Building**: Connect developers using similar functionality\n- **Documentation**: Improve examples based on actual use cases\n\nExample usage report:\n```\nProject: \"RetroBlaster 2024\"\nRoutines: Multiply8x8 (NEXT_COMPACT), DMA screen clearing (BURST), Random8 XORShift, Off-screen rendering\nPlatform: ZX Spectrum Next (Using Z80N optimizations and DMA)\nDescription: Side-scrolling shooter with procedural enemies, ultra-fast multiplication, and smooth double-buffered graphics\n```\n\nThis information helps improve the library and provides inspiration to other developers.\n\n**TL;DR**: Use it freely, modify it, distribute it, make money with it - just don't blame me if something goes wrong! And if you feel like sharing what you built, that's awesome! 😊\n\n## 🤝 **Contributing**\n\nContributions are welcome! Please ensure:\n\n1. **Code Quality**: Follow Z80 assembly best practices\n2. **Performance**: Maintain T-state accuracy documentation  \n3. **Testing**: Add test cases for new functionality\n4. **Documentation**: Update README and inline comments\n5. **Hardware Compatibility**: Test on both standard Z80 and Next hardware where applicable\n\n## 🎮 **Use Cases**\n\nNextLibrary is perfect for:\n\n- **Retro Game Development**: High-performance math for physics, scoring, and graphics\n- **System Programming**: Efficient utilities for Next-specific applications  \n- **Educational Projects**: Well-documented Z80 assembly examples\n- **Performance-Critical Code**: T-state accurate timing for real-time applications\n- **Graphics Programming**: Fast screen clearing and off-screen rendering\n- **Hardware Optimization**: Automatic detection and utilization of Next-specific features\n\n## 🔗 **Related Projects**\n\n- [ZX Spectrum Next Official](https://www.specnext.com/)\n- [sjasmplus Assembler](https://github.com/z00m128/sjasmplus)\n- [NextBuild Development Tools](https://github.com/Threetwosevensixseven/NextBuild)\n- [ZX Spectrum Next Wiki](https://wiki.specnext.dev/)\n\n## 📧 **Contact**\n\nFor questions, suggestions, or support, please open an issue on GitHub.\n\n---\n\n**NextLibrary** - *Empowering Z80 assembly development with world-class mathematics, utilities, and hardware-accelerated performance.*\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantwjadam%2Fz80nextlibrary","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantwjadam%2Fz80nextlibrary","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantwjadam%2Fz80nextlibrary/lists"}