{"id":19149240,"url":"https://github.com/hoshimin/beesynth","last_synced_at":"2025-05-07T04:43:01.282Z","repository":{"id":179886268,"uuid":"664276759","full_name":"HoShiMin/BeeSynth","owner":"HoShiMin","description":"The frequency-perfect synthesizer for a PC-speaker","archived":false,"fork":false,"pushed_at":"2023-10-05T20:45:56.000Z","size":37614,"stargazers_count":15,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-07T04:42:50.027Z","etag":null,"topics":["driver","iopl","kernel","mp3","pc-speaker","player","rust","speaker","synth","synthesizer","wav"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HoShiMin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-07-09T13:25:19.000Z","updated_at":"2024-09-26T07:09:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"88c0a5e5-3312-44de-a508-88f0f53c901f","html_url":"https://github.com/HoShiMin/BeeSynth","commit_stats":null,"previous_names":["hoshimin/beesynth"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HoShiMin%2FBeeSynth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HoShiMin%2FBeeSynth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HoShiMin%2FBeeSynth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HoShiMin%2FBeeSynth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HoShiMin","download_url":"https://codeload.github.com/HoShiMin/BeeSynth/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252816517,"owners_count":21808702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["driver","iopl","kernel","mp3","pc-speaker","player","rust","speaker","synth","synthesizer","wav"],"created_at":"2024-11-09T08:07:19.506Z","updated_at":"2025-05-07T04:43:01.249Z","avatar_url":"https://github.com/HoShiMin.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎵 BeeSynth Project\n## The frequency-perfect synthesizer for a PC speaker\n### ✔️ Features\n* Written in [Rust](https://www.rust-lang.org/) 🦀\n* Support playing MP3, WAV, FLAC, tracker music and more on a PC speaker.\n* Digital sound processing: pass filters, notes extraction and more.\n* Support for chaining filters in a pipeline.\n* Synthesizer with its own musical notation.\n* Extremely low-latency audio output using IOPL.\n* Batteries are included! 🔋\n\n---\n\n### ▶ Usage\n\u003e Requires Administrator rights in order to load a kernel driver.\n\nThere are a lot of options to fine tune your sound, but you can omit them all and just play your music:\n```cmd\nbeesynth.exe music.mp3\n```\nFor a detailed description of command line arguments, see here:\n* 🌊 [BeeWave](./BEEWAVE.md) - player and DSP engine for different audio formats.\n* 🎹 [BeeSynth](./BEESYNTH.md) - synthesizer which allows you to write your own music using a musical notation.   \n\n---\n\n### **∫**  How it works\nTable of contents:\n* [PC-speaker](#pc-speaker)\n    - [Frequency generator](#frequency-generator)\n    - [Direct membrane control](#direct-membrane-control)\n* [Dealing with I/O ports from usermode](#deal-with-ports)\n    - [Input-Output Privilege Level (IOPL)](#iopl)\n    - [Input-Output Permission Bitmap (IOPB)](#iopb)\n    - [Patch EFLAGS.IOPL in the KTRAP_FRAME](#patch-iopl)\n* [Sound theory](#sound-theory)\n    - [Pulse-Code Modulation (PCM)](#pcm)\n    - [Fourier expansion](#fourier-expansion)\n    - [Multichannel approach](#multichannel-approach)\n* [Time management](#time-management)\n    - [Measure CPU frequency](#cpu-freq)\n    - [Nano-sleep](#nano-sleep)\n\n\n---\n\n### \u003ca id=\"pc-speaker\"\u003e\u003c/a\u003e ♫ PC-speaker\n\nAlmost everyone desktop computers have a PC speaker. It's a small piezoelectric buzzer that you hear every time your PC turns on, that signals that [Power-On Self Test](https://en.wikipedia.org/wiki/Power-on_self-test) is completed.  \n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/speaker.webp\"/\u003e\u003c/p\u003e\n\nIt is controlled via [I/O ports](https://wiki.osdev.org/I/O_Ports), and its membrane can have only two positions: raised when the voltage is applied to the membrane, and lowered when the voltage is removed. Using I/O ports we can control the position of the membrane and thus generate sound.\n\nSchematically it looks like this:\n\n\u003cp style=\"text-align: center;\"\u003e\u003cimg src=\"images/circuit.svg\"/\u003e\u003c/p\u003e\n\nI/O ports are the way the CPU communicates with peripherals and the chipset using two privilege instructions: `in` and `out`. You can find description for all I/O ports in specification for your chipset:\n* **Intel Chipset Family Platform Control Hub Datasheet** (for 700 Series PCH: [Vol.1](https://www.intel.com/content/www/us/en/content-details/743835/intel-700-series-chipset-family-platform-controller-hub-datasheet-volume-1-of-2.html) and [Vol.2](https://www.intel.com/content/www/us/en/content-details/743845/intel-700-series-chipset-family-platform-controller-hub-datasheet-volume-2-of-2.html)).\n* **AMD Processor Programming Reference** ([PPR](https://www.amd.com/en/support/tech-docs)).\n\nThere are two ways to control a PC speaker: use a frequency generator and send its output to the input of the speaker or set the position of the membrane manually. Let's consider them all.\n\n---\n\n### \u003ca id=\"frequency-generator\"\u003e\u003c/a\u003e **∿** Frequency generator\nThe first way is to use [Programmable Interrupt Timer (PIT)](https://wiki.osdev.org/Programmable_Interval_Timer) that can generate a square wave with a fixed frequency of 1'193'182 Hz. We can get a desired frequency by setting up a 16-bit divisor in the range from 1 to 65535 that gives us frequencies from 1.193182 MHz to 19 Hz accordingly.  \nWe can deduce the min and max frequencies and the relationship between the divisor and the desired frequency:  \n_**Fbase**_ = 1.193182 MHz - is the fixed frequency of the PIT.  \n_**Divisor**_ ∈ [1..65535], excluding zero because you can't divide by zero.  \n_**Fdesired**_ = _**Fbase**_ / _**Divisor**_  \n_**Fmin**_ = 1.193182 MHz / 65535 ≈ 18.206 Hz, after rounding up we get 19 Hz.  \n_**Fmax**_ = 1.193182 MHz / 1 = 1193182 Hz  \n\nFirst of all we need to prepare the PIT to generate square waves using its control port 0x43. Let's see its layout:  \n```\nBits        Usage\n6 and 7     Select channel:\n                0 0 = Channel 0\n                0 1 = Channel 1\n                1 0 = Channel 2\n                1 1 = Read-back command (8254 only)\n4 and 5     Access mode:\n                0 0 = Latch count value command\n                0 1 = Access mode: lobyte only\n                1 0 = Access mode: hibyte only\n                1 1 = Access mode: lobyte/hibyte\n1 to 3      Operating mode:\n                0 0 0 = Mode 0 (interrupt on terminal count)\n                0 0 1 = Mode 1 (hardware re-triggerable one-shot)\n                0 1 0 = Mode 2 (rate generator)\n                0 1 1 = Mode 3 (square wave generator)\n                1 0 0 = Mode 4 (software triggered strobe)\n                1 0 1 = Mode 5 (hardware triggered strobe)\n                1 1 0 = Mode 2 (rate generator, same as 010b)\n                1 1 1 = Mode 3 (square wave generator, same as 011b)\n0           BCD/Binary mode: 0 = 16-bit binary, 1 = four-digit BCD\n```\nThe only channel that connected to the PC speaker is the Channel 2.  \nWe need to select the Channel 2, set the Access mode to lobyte/hibyte to work with 16-bit divisor and set the Operating mode to square wave generator.  \nSo, we need to write `0xB6` or `0b10_11_111_0` to the control port `0x43`:\n```asm\n;  10_11_111_0 = 0xB6\n;  ^  ^  ^   ^\n;  |  |  |   Use 16-bit binary for a divisor\n;  |  |  Square wave generator\n;  |  Access mode: the low byte is the first, the high byte is the second\n;  Channel 2\n\nmov al, 0xB6\nout 0x43, al\n```\nNow we need to write the divisor to the Channel 2 data port `0x42` in two steps: the low part and the high part:\n```asm\ndivisor dw 0BBAAh  ; 16-bit divisor\n\nmov ax, divisor    ; al = divisor.low, ah = divisor.high\nout 0x42, al       ; port[0x42] = low\nshr ax, 8          ; al = ah\nout 0x42, al       ; port[0x42] = high\n\n; This gives us the desired frequency of 24 Hz:\n; 1'193'182 Hz / 0xBBAA = 24 Hz\n```\nAnd finally we have to turn on the speaker using the NMI Status and Control port `0x61` (**NMI_STS_CNT** in Intel terms or **NMI_STATUS** in AMD terms).  \n```\nBits        Usage\n7           SERR# NMI Source Status\n6           IOCHK# NMI Source Status\n5           SPKRCLK (The output of the Counter 2)\n4           Reserved in Intel, REFCLK (The output of the Counter 1) in AMD\n3           IOCHK# NMI Enable\n2           SERR# NMI Enable\n1           Speaker Data Enable:\n                0 = SPKR output is 0 (voltage is disabled)\n                1 = SPKR output is 1 (voltage is applied)\n0           Timer Counter 2 Enable:\n                0 = Counter 2 is disabled\n                1 = Counter 2 is enabled\n```\nWe interested in the bits 1 and 0. We need to set them to 1 to enable the PIT timer and apply voltage to the PC speaker:\n```asm\n; Enable the speaker by enabling the PIT timer\n; and applying voltage to the PC speaker:\n; port[0x61] |= 0b11\n\nin al, 0x61   ; Read the current value\nor al, 0b11   ; Set bits 1 and 0\nout 0x61, al  ; Write the new value\n\n; Mute the speaker by disabling the PIT timer\n; and removing voltage the PC speaker:\n; port[0x61] \u0026= ~0b11\n\nin al, 0x61       ; Read the current value\nand al, 11111100b ; Reset bits 1 and 0\nout 0x61, al      ; Write the new value\n```\nWith this code, we turned on the frequency generator in PIT that was connected to the speaker input.\n\n---\n\n### \u003ca id=\"direct-membrane-control\"\u003e\u003c/a\u003e ⇅ Direct membrane control\nThe second way is to control the position of the PC speaker's membrane directly by applying and removing voltage manually using the bit 1 (_Speaker Data Enable_) in the control port `0x61`:\n```asm\n; Raise the membrane:\n; port[0x61] |= 0b10\n\nin al, 0x61   ; Read the current value\nor al, 0b10   ; Apply voltage\nout 0x61, al  ; Write the new value\n\n; Reset the membrane:\n; port[0x61] \u0026= ~0b10\n\nin al, 0x61       ; Read the current value\nand al, 11111101b ; Remove voltage\nout 0x61, al      ; Write the new value\n```\nWe don't need to enable and prepare the PIT timer in this case as we are acting like a frequency generator ourselves.\n\n---\n\n### \u003ca id=\"deal-with-ports\"\u003e\u003c/a\u003e ⮂ Deal with ports from usermode\nOn the way to deal with the speaker, we encounter the following problem: the `in` and `out` instructions are privileged and can only be executed in the kernel mode. The first obvious solution is to use a kernel driver that will work with I/O ports and call it from our application. But it brings unwanted delays caused by creating and dispatching [IOCTL](https://learn.microsoft.com/en-us/windows/win32/devio/device-input-and-output-control-ioctl-) and [IRP](https://learn.microsoft.com/en-us/windows-hardware/drivers/gettingstarted/i-o-request-packets) requests and switching from Ring3 to Ring0 and back.\n\nBut there are two ways to allow access to I/O ports from usermode:\n* **\u003ca id=\"iopl\"\u003e\u003c/a\u003e** The first one is to use the [I/O Privilege Level (IOPL)](https://en.wikipedia.org/wiki/IOPL) flag in the [EFLAGS](https://en.wikipedia.org/wiki/FLAGS_register_(computing)) register. It can take values from 0 to 3 and controls the current privilege level ([CPL](https://en.wikipedia.org/wiki/Protection_ring#Privilege_level) that is already known as Ring) from which the CPU can access the `in`, `out`, `cli` and `sti` instructions. Normally the `EFLAGS.IOPL` is set to 0, which means that access to these instructions is granted only from Ring0, but if we set it to 3, we will be able to execute them from usermode. Changing the `EFLAGS.IOPL` is only available from kernel mode. In Linux we have the specified system call [`iopl()`](https://man7.org/linux/man-pages/man2/iopl.2.html) that allows us to change the IOPL flag, but in Windows there are no ways to do this without a kernel driver: you can't set IOPL field using the [`SetThreadContext`](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreadcontext) as the kernel forcibly resets it to zero.\n* **\u003ca id=\"iopb\"\u003e\u003c/a\u003e** The second one is to use [I/O Permission Bitmap (IOPB)](https://en.wikipedia.org/wiki/Task_state_segment#I/O_port_permissions). It is a bitmap in the [Task State Segment (TSS)](https://en.wikipedia.org/wiki/Task_state_segment) that controls access to each port separately. Each bit in the bitmap corresponds to a specific I/O port. If the bit is set to 0, access to the port is granted, and if it is set to 1, access is denied. 32-bit Windows has three undocumented kernel functions to modify the bitmap: [`Ke386SetIoAccessMap()`](https://github.com/HighSchoolSoftwareClub/Windows-Research-Kernel-WRK-/blob/master/WRK-v1.2/base/ntos/ke/i386/iopm.c#L76), [`Ke386QueryIoAccessMap()`](https://github.com/HighSchoolSoftwareClub/Windows-Research-Kernel-WRK-/blob/master/WRK-v1.2/base/ntos/ke/i386/iopm.c#L192) and [`Ke386IoSetAccessProcess()`](https://github.com/HighSchoolSoftwareClub/Windows-Research-Kernel-WRK-/blob/master/WRK-v1.2/base/ntos/ke/i386/iopm.c#L276). You can read more about them here: https://github.com/eantcal/ioperm. These functions are absent in 64-bit Windows, but you can find and modify the 64-bit TSS manually as it also contains an IOPB.\n\n\u003ca id=\"patch-iopl\"\u003e\u003c/a\u003e\n\nAs we want to deal with I/O ports in modern 64-bit Windows, we will use the first way. First of all, we need to determine what exactly and where we have to patch. Let's consider how a thread walks between privilege levels:\n```\n... Any user code ...\nkernel32!CreateFile()\n    ntdll!NtCreateFile()\n        syscall(N)          Ring 3\n----------------------------------\n        KiSystemCall64()    Ring 0\n            [ Save the usermode context to the KTRAP_FRAME structure ]\n            [ Dispatch through the KiServiceTable ]\n                ntoskrnl!NtCreateFile()\n            [ Restore the usermode context from the KTRAP_FRAME ]\n            KiKernelSysretExit()     ; Return to the Ring3\n\n```\nWe see that the kernel saves the usermode context to the [KTRAP_FRAME](https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/ntos/amd64_x/ktrap_frame.htm) structure before calling the syscall handler and restores it before returning to the usermode. This structure resides on the bottom of the kernel stack. You can find the beginning of the kernel stack for the current thread using the [`IoGetInitialStack()`](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-iogetinitialstack) function. So, to find the KTRAP_FRAME structure, we need to subtract the size of the structure from the stack pointer as a stack grows from the upper addresses to lower. At this point we can modify any register in the user context and it will be applied at the restoration point. Let's patch the `EFLAGS.IOPL` flag:\n```cpp\n#include \u003cntddk.h\u003e\n\nauto* stack = static_cast\u003cunsigned char*\u003e(IoGetInitialStack());\nauto* frame = reinterpret_cast\u003cKTRAP_FRAME*\u003e(stack - sizeof(KTRAP_FRAME));\nframe-\u003eEFlags |= 0x3000; // Raise IOPL to Ring3\n```\nAfter that let's go to usermode and check whether it works:\n```cpp\n#include \u003cintrin.h\u003e\n\n// Your usermode app:\nint main()\n{\n    //\n    // Call your driver to perform patching\n    // for this thread as was stated above.\n    //\n    DeviceIoControl(...);\n\n    // Let's check:\n    _disable(); // cli\n    _enable();  // sti\n\n    return 0;\n}\n```\nBut there is the second challenge: to install a driver, you either need a [EV certificate](https://www.globalsign.com/en/code-signing-certificate/ev-code-signing-certificates) or you need to disable digital signature verification using these commands:\n```\n#\n# Requires Administrator rights and reboot.\n#\n\n# Allow installing of unsigned drivers:\nbcdedit.exe /set loadoptions DISABLE_INTEGRITY_CHECKS\nbcdedit.exe /set TESTSIGNING ON\n\n# Deny installing of unsigned drivers:\nbcdedit.exe /set loadoptions ENABLE_INTEGRITY_CHECKS\nbcdedit.exe /set TESTSIGNING OFF\n```\nLet's consider a way to patch the `EFLAGS.IOPL` using already signed drivers. These may be vulnerable drivers or drivers that provide functions for editing or mapping kernel or physical memory. One of these is the [InpOut](https://www.highrez.co.uk/downloads/inpout32/): it is signed, it's not banned by Microsoft, it works with SecureBoot enabled and it is able to map physical memory using [`ZwMapViewOfSection()`](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-zwmapviewofsection) for the `\\Device\\PhysicalMemory` object. We should map all physical memory into the userspace and find the `KTRAP_FRAME` there.\n\nThe scheme will be as follows:\n1. Put the desired thread into the kernel and suspend in there. It gives us unlimited time to find its `KTRAP_FRAME` in the physical memory.\n2. Make \"anchors\" in the context of our suspended thread so we know what to look for. It can be achived using the [`SetThreadContext()`](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreadcontext) function. We can assign the values of some registers to known magic values, which we will look for later.\n3. Enumerate all physical RAM regions and map them into the usermode address space of our process. The RAM physical address space is not continuous: it is interspersed with areas reserved for I/O space for devices, so access to these regions can cause [unforeseen consequences](https://www.youtube.com/watch?v=RJN19V9-8hs). Physical memory ranges can be found in the registry key `HKEY_LOCAL_MACHINE\\HARDWARE\\RESOURCEMAP\\System Resources\\Physical Memory\\.Translated`, which consists of [`CM_RESOURCE_LIST`](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_cm_resource_list) structures.\n4. Map each physical region into a userspace and find the `KTRAP_FRAME` structure in it using anchors (magic values) from the second step.\n5. Once we found the `KTRAP_FRAME` structure, we can patch it as described above, unmap the region and resume the thread.\n```cpp\n//\n// Pseudocode, error checking is omitted for the simplicity.\n//\n\nstruct PhysRegion\n{\n    uint64_t base;\n    uint64_t size;\n};\n\nstd::list\u003cPhysRegion\u003e getPhysRanges()\n{\n    // Parse HKEY_LOCAL_MACHINE\\HARDWARE\\RESOURCEMAP\\System Resources\\Physical Memory\\.Translated\n    return ...;\n}\n\nstruct Mapping\n{\n    void* base;\n    size_t size;\n};\n\nMapping mapPhysRegion(const PhysRegion\u0026 physRegion)\n{\n    // Map the region using any driver that supports it.\n    return ...;\n}\n\nvoid unmapPhysRegion(const Mapping\u0026 mapping)\n{\n    // Unmap the region.\n}\n\n//\n// The given thread must be in the kernel\n// until this function has finished.\n//\nbool patchIopl(HANDLE hThread)\n{\n    CONTEXT context{};\n    context.ContextFlags = CONTEXT_ALL;\n    GetThreadContext(hThread, \u0026context);\n\n    // Save the original context:\n    const CONTEXT originalContext = context;\n\n    // Just magic values which we will look for:\n    context.Rax = 0x1ee7c0de;\n    context.Rbx = 0xc0ffee;\n    context.Rcx = 0x7ea;\n    context.Rdx = 0xcaca0;\n\n    // Set our magic anchors:\n    SetThreadContext(hThread, \u0026context);\n\n    // Destroy tails of magic values in the stack:\n    context = {};\n\n    bool isKtrapFrameFound = false;\n\n    const auto physRanges = getPhysRanges();\n    for (const auto\u0026 physRange : physRanges)\n    {\n        const auto mapped = mapPhysRegion(physRange);\n        for (uint64_t* value = static_cast\u003cuint64_t*\u003e(mapped.base) + sizeof(KTRAP_FRAME) / sizeof(uint64_t);\n            value \u003c static_cast\u003cuint64_t*\u003e(mapped.base) + mapped.size / sizeof(uint64_t);\n            ++value)\n        {\n            if (*value == 0x1ee7c0de)\n            {\n                // It's not an anchor:\n                continue;\n            }\n\n            KTRAP_FRAME* const candidate = CONTAINING_RECORD(value, KTRAP_FRAME, Rax);\n            if (candidate-\u003eRbx != 0xc0ffee\n                || candidate-\u003eRcx != 0x7ea\n                || candidate-\u003eRdx != 0xcaca0)\n            {\n                // It's not an anchor:\n                continue;\n            }\n\n            // We found the KTRAP_FRAME:\n            SetThreadContext(hThread, \u0026originalContext); // Restore the original context\n            candidate-\u003eEFlags |= 0x3000; // Raise IOPL to Ring3\n            isKtrapFrameFound = true;\n            break;\n        }\n        unmapPhysRegion(mapped);\n\n        if (isKtrapFrameFound)\n        {\n            break;\n        }\n    }\n\n    if (!isKtrapFrameFound)\n    {\n        SetThreadContext(hThread, \u0026originalContext); // Restore the original context\n    }\n\n    return isKtrapFrameFound;\n}\n\n// Patch IOPL of the current thread:\nbool patchSelfIopl()\n{\n    struct ThreadInfo\n    {\n        HANDLE hThread;\n        HANDLE hThreadArrivedToKernelEvent;\n        HANDLE hPatchFinishedEvent;\n        bool ioplWasPatched;\n    };\n\n    ThreadInfo threadInfo{};\n    threadInfo.hThread = OpenThread(THREAD_ALL_ACCESS, FALSE, GetCurrentThreadId());\n    threadInfo.hThreadArrivedToKernelEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);\n    threadInfo.hPatchFinishedEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);\n\n    // Create the supplementor thread that will patch our thread as\n    // the target thread must be in kernel all the time.\n    HANDLE hPatcherThread = CreateThread(nullptr, 0, [](void* arg) -\u003e DWORD\n    {\n        auto* const info = static_cast\u003cThreadInfo*\u003e(arg);\n\n        // Wait until the target thread entered the kernel:\n        WaitForSingleObject(info-\u003ehThreadArrivedToKernelEvent, INFINITE);\n\n        // Patch its IOPL:\n        info-\u003eioplWasPatched = patchIopl(info-\u003ehThread);\n\n        // Return the target thread to usermode:\n        SetEvent(info-\u003ehPatchFinishedEvent);\n\n        return 0;\n    }, \u0026threadInfo, 0, nullptr);\n\n    //\n    // Atomically signal that our thread was entered into the kernel\n    // and wait without exiting to usermode.\n    //\n    SignalObjectAndWait(\n        info-\u003ehThreadArrivedToKernelEvent,\n        info-\u003ehPatchFinishedEvent,\n        INFINITE,\n        FALSE\n    );\n\n    WaitForSingleObject(hPatcherThread, INFINITE);\n\n    CloseHandle(hPatcherThread);\n    CloseHandle(threadInfo.hThread);\n    CloseHandle(threadInfo.hThreadArrivedToKernelEvent);\n    CloseHandle(threadInfo.hPatchFinishedEvent);\n\n    return threadInfo.ioplWasPatched;\n}\n\nint main()\n{\n    // Patch IOPL of the current thread:\n    patchSelfIopl();\n\n    // Now we can use the in/out/cli/sti instructions:\n    _disable(); // cli\n    _enable(); // sti\n\n    return 0;\n}\n```\n\n---\n\n### \u003ca id=\"sound-theory\"\u003e\u003c/a\u003e  $\\int_{}^{}$ Sound theory\nWell, now we have a way to control a PC speaker from an application. Now we need _what_ to play.\n\n\u003ca id=\"pcm\"\u003e\u003c/a\u003e\n\nThe most convenient format is [WAV](https://en.wikipedia.org/wiki/WAV). You can find specification on a format [here](http://soundfile.sapp.org/doc/WaveFormat/) or [here](https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html). It contains an array of samples encoded with [Pulse-Code Modulation (PCM)](https://en.wikipedia.org/wiki/Pulse-code_modulation). In other words, each sample represents an amplitude of the speaker in a particular moment of time.\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/pcm.svg\"/\u003e\u003c/p\u003e\n\nHowerer, this format is only applicable to the real speaker whose diaphragm position can be controlled flexibly by changing the voltage amplitude. But the PC speaker is a simple piezoelectric buzzer which can only be turned on and off: there are no intermediate states. So, we need to convert the PCM amplitudes into a sequence of on/off samples.\n\nThe first obvious way is to compare a sample with zero. If the sample is greater than zero - treat it as a speaker's up position, if the sample is lower than zero - treat as down.\n\nIt will look like this: the blue line is an original PCM signal, the green line is a signal that we will send to the speaker.\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/simple.svg\"/\u003e\u003c/p\u003e\nWe can see how much information we loss in this approach.\n\nBut we can do smarter. We can switch the speaker's state if the current amplitude differs from the amplitude in the past switching by more than a given percentage. It will look like this:\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/differential.svg\"/\u003e\u003c/p\u003e\nWe see that this approach brings a lot more information than a previous way, so the sound will have a better quality.\n\n\u003ca id=\"fourier-expansion\"\u003e\u003c/a\u003e\n\nBut there is another way to play sound. As we know, each finite periodic function can be represented as a sum of harmonics - sine waves with different frequencies and amplitudes. This representation is called [Fourier series expansion](https://en.wikipedia.org/wiki/Fourier_series):  \n$$\\hat{f}(\\omega)=\\frac{1}{\\sqrt{2\\pi}}\\int_{-\\infty}^{\\infty} f(x)e^{-ix\\omega}\\, dx$$\n\nWhere:  \n$f(x)$ is an infinite periodic function which we want to expand.  \n$\\omega$ is a frequency of a harmonic.  \n$\\hat{f}(\\omega)$ is a complex amplitude of a harmonic with a frequency $\\omega$.  \n\nExpanding the function into a Fourier series, we can get a set of harmonics (frequencies) with their amplitudes that make up the signal at any given time. This expansion is called spectrum.\n\nWe also remember that the PC speaker has a regime in which we can set the sound frequency using the PIT timer. So, we can get the dominant frequencies at any given time in our signal and play them back on the speaker.\n\nAs the wave is not infinite function that is requried by analitical solution, we can use [discrete Fourier transform](https://en.wikipedia.org/wiki/Discrete_Fourier_transform) in which the integral is replaced by a finite sum:  \n$$X_k = \\sum_{n=0}^{N-1} x_n e^{\\frac{-i 2 \\pi}{N} k n}=\\sum_{n=0}^{N-1} x_n [\\cos(2 \\pi k n / N) - i \\sin(2 \\pi k n / N)], \\space\\space\\space\\space k = 0, \\ldots, N-1.$$\n\nWhere:  \n$N$ - number of samples.  \n$x_n$ - value of the signal at time $n$  \n$X_k$ - value of the Fourier transform at frequency $k$.  \n\nThe second part follows from the [Euler's formula](https://en.wikipedia.org/wiki/Euler%2527s_formula):  \n$$e^{ix} = \\cos(x) + i \\sin(x)$$\n\nIn order to apply this to our wave file we need to create a sampling window with a given size $N$ and apply the discrete Fourier transform to it. In result we will get an array with size $N$ where each element is a complex amplitude of the frequency according to the position of the element. The frequency of the entry is calculated as follows:  \n$$f_k = \\frac{k}{N} \\cdot f_s$$\n\nWhere:  \n$f_k$ - frequency of the element.  \n$k$ - position of the element, $k = 0,...,N-1$.  \n$N$ - size of the sampling window (e.g. 4096).  \n$f_s$ - sampling frequency (e.g. 44100 Hz for a typical WAV file).  \n\nProgrammatically, we can calculate the discrete Fourier transform using the [Fast Fourier Transform (FFT)](https://en.wikipedia.org/wiki/Fast_Fourier_transform) algorithm.\n\nAs a result, we will get an array of complex amplitudes that have contributon to the audio signal in the selected sampling window. Complex numbers have two parts: real and imaginary. The real part of the frequency is the amplitude of the sine part, and the imaginary part is the amplitude of the cosine part. Using the [complex plane](https://en.wikipedia.org/wiki/Complex_plane) and the [Pythagorean theorem](https://en.wikipedia.org/wiki/Pythagorean_theorem), we can calculate the modulus of a complex number:\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/complex-plane.svg\"/\u003e\u003c/p\u003e\n\n$$|z| = \\sqrt{Re(z)^2 + Im(z)^2}$$\n\nFinally, to convert the modulus of a complex amplitude into a habitual [decibels](https://en.wikipedia.org/wiki/Decibel), we can use the following formula:  \n$$dB = 20 \\cdot \\log_{10}(|z|)$$\n\nWe can demonstrate this.  \nLet's generate a periodic signal in [Wolfram Mathematica](https://www.wolfram.com/mathematica/):\n```mathematica\nsignal[x_] := 0.8 Sin[0.9 x] + 0.3 Sin[0.6 x ] + 0.5 Cos[0.3 x] + 0.3 Sin[x^2];\nwave = Table[signal[x], {x, 0, 512, 1}];\nListPlot[{wave}, Joined -\u003e True, PlotStyle -\u003e Line, PlotRange -\u003e All]\n```\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/wave.svg\"/\u003e\u003c/p\u003e\n\nAnd perform expansion into a Fourier series, which will give us the spectrum:\n```mathematica\nfourier = Fourier[wave];\nfourier = Take[fourier, {1, Floor[Length[fourier] / 2], 1}];\nListPlot[Sqrt[(Re[fourier])^2 + (Im[fourier])^2], Joined -\u003e True, PlotStyle -\u003e Line, PlotRange -\u003e All, Filling -\u003e Axis]\n```\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/spectrum.svg\"/\u003e\u003c/p\u003e\nThese peaks are the dominant frequencies in the sampling window. Knowing this, we can set the PIT timer to the frequency of the highest peak and play it on the speaker: it will be a mono sound.\n\n\u003ca id=\"multichannel-approach\"\u003e\u003c/a\u003e\n\nAt the same time we can extract some of the most valueable frequencies and put them into several channels. We can switch them quickly one by one, like this:\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/multichannel.svg\"/\u003e\u003c/p\u003e\n\nThis approach gives us a way to emulate polyphonic sound.\n\n---\n\n### \u003ca id=\"time-management\"\u003e\u003c/a\u003e ⏱️ Time management\nWhen playing a sound, we need to make delays between switching the state of the speaker. Let's calculate the minimum precision required for switching between samples in a typical WAV file with the discretization frequency of 44100 Hz: \n$$\\frac{1}{44100} \\approx 22.6 \\space \\mu s$$\n\nSo, to implement fast and precision delays we need more than [`Sleep()`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-sleep) that has a precision of 1 ms. If we go deeper, we can use `NtDelayExecution()` from `ntdll.dll` which has the following prototype:\n```cpp\nNTSYSAPI NTSTATUS NTAPI NtDelayExecution(IN BOOLEAN Alertable, IN PLARGE_INTEGER Interval);\n```\nIt has a precision of 100 ns, which is more than enough for us. But with such low delays, the overhead of calling functions becomes extremely high. Switching to and from the kernel, potential thread switching by the scheduler, complicated wait logic in the kernel - all of these introduce huge errors in wait time and in themselves have a large and unpredictable execution time.\n\nWe need a low-latency way to wait for a given time with a very high resolution without jumping to the kernel with a predictable execution time. And such way is to use the [CPU timestamp counter (TSC)](https://en.wikipedia.org/wiki/Time_Stamp_Counter). It's a 64-bit CPU register that counts the number of cycles since the last reset. It is incremented on each clock cycle and is not affected by frequency scaling on modern CPUs.\n\n\u003ca id=\"cpu-freq\"\u003e\u003c/a\u003e\n\nKnowing the CPU frequency we can calculate the required number of cycles to wait for a given time. In Intel, we can obtain the CPU base frequency using the [CPUID](https://en.wikipedia.org/wiki/CPUID) instruction with the _Processor Frequency Information_ leaf:\n```cpp\n#include \u003cintrin.h\u003e\n\nusing Hertz = unsigned long long;\n\nHertz getIntelBaseCpuFrequency() noexcept\n{\n    constexpr auto k_processorFrequencyInformation = 0x16;\n\n    union ProcessorFrequencyInformation\n    {\n        int raw[4]; // [eax][ebx][ecx][edx]\n        struct\n        {\n            int eax;\n            int ebx;\n            int ecx;\n            int edx;\n        } layout;\n        struct\n        {\n            // Base frequency:\n            unsigned short base;    // EAX:[15..0], in MHz\n            unsigned short eaxHigh; // EAX:[31..16], reserved\n\n            // Maximum frequency:\n            unsigned short maximum; // EBX:[15..0], in MHz\n            unsigned short ebxHigh; // EBX:[31..16], reserved\n\n            // Bus frequency:\n            unsigned short bus;     // ECX:[15..0], in MHz\n            unsigned short ecxHigh; // ECX:[31..16], reserved\n\n            unsigned int edx;       // EDX, reserved\n        } freq;\n    };\n\n    ProcessorFrequencyInformation freqInfo{};\n    __cpuid(\u0026freqInfo.raw[0], k_processorFrequencyInformation);\n\n    return static_cast\u003cHertz\u003e(freqInfo.freq.base) * 1'000'000;\n}\n```\nBut there is no corresponding CPUD leaf in AMD, so we have to calculate the frequency ourselves. We can poll for some known time and meause ticks delta between the beginning and the end of the polling. Knowing the polling time and the number of ticks we can calculate the CPU frequency. We do not use [`Sleep()`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-sleep) which calls `NtDelayExecution()` internally as it performs syscall that impacts on measurements, but [`GetTickCount()`](https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-gettickcount) and [`GetTickCount64()`](https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-gettickcount64) read the current tick count directly from the kernel shared memory [`KUSER_SHARED_DATA`](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/ns-ntddk-kuser_shared_data):\n```asm\n; ULONGLONG __stdcall GetTickCount64Kernel32()\n; {\n;     return KUSER_SHARED_DATA-\u003eTickCountLow * KUSER_SHARED_DATA-\u003eTickCountMultiplier;\n; }\n\nGetTickCount64Kernel32 proc near\n    mov     ecx, ds:7FFE0004h\n    mov     eax, 7FFE0320h\n    shl     rcx, 20h\n    mov     rax, [rax]\n    shl     rax, 8\n    mul     rcx\n    mov     rax, rdx\n    retn\nGetTickCount64Kernel32 endp\n```\nSo, we can use the following code to measure the CPU frequency for AMD processors:\n```cpp\n#include \u003cWindows.h\u003e\n#include \u003cintrin.h\u003e\n\nusing Hertz = unsigned long long;\n\nHertz getAmdBaseCpuFrequency() noexcept\n{\n    constexpr auto k_measurementCount = 5;\n    constexpr auto k_msInSec = 1000;\n    constexpr auto k_measuringIntervalMsec = 200;\n\n    unsigned long long frequencyAccumulator = 0;\n\n    //\n    // Measure the CPU frequency k_measurementCount times to get the average.\n    // You can calculate the median instead of the average for more accuracy.\n    //\n    for (auto i = 0; i \u003c k_measurementCount; ++i)\n    {\n        const auto initialTickCount = GetTickCount64();\n        const auto begin = __rdtsc();\n\n        //\n        // Poll for the given time.\n        // Avoid use of _mm_pause() here as it introduces non-predictable delays.\n        //\n        while ((GetTickCount64() - initialTickCount) \u003c k_measuringIntervalMsec)\n        {\n        }\n\n        const auto end = __rdtsc();\n\n        const auto elapsedCycles = end - begin;\n\n        frequencyAccumulator += elapsedCycles * k_msInSec / k_measuringIntervalMsec;\n    }\n\n    return frequencyAccumulator / k_measurementCount;\n}\n```\n\n\u003ca id=\"nano-sleep\"\u003e\u003c/a\u003e\n\nNow we can implement a function that waits for a given time using the TSC:\n```cpp\n#include \u003cintrin.h\u003e\n\n//\n// Use GetIntelBaseCpuFrequency() and GetAmdBaseCpuFrequency() from the above.\n//\n\nunion MaximumFunctionNumberAndVendorId\n{\n    static constexpr auto k_leaf = 0;\n\n    int raw[4]; // [eax][ebx][ecx][edx]\n\n    struct\n    {\n        unsigned int LargestStandardFunctionNumber;\n        unsigned int VendorPart1; // 'uneG' || 'htuA'\n        unsigned int VendorPart3; // 'letn' || 'DMAc' --\u003e 'GenuineIntel' or 'AuthenticAMD' (EAX + EDX + ECX)\n        unsigned int VendorPart2; // 'Ieni' || 'itne'\n\n        bool isIntel() const\n        {\n            // GenuineIntel:\n            return (VendorPart1 == 'uneG')\n                \u0026\u0026 (VendorPart2 == 'Ieni')\n                \u0026\u0026 (VendorPart3 == 'letn');\n        }\n\n        bool isAmd() const\n        {\n            // AuthenticAMD:\n            return (VendorPart1 == 'htuA')\n                \u0026\u0026 (VendorPart2 == 'itne')\n                \u0026\u0026 (VendorPart3 == 'DMAc');\n        }\n    } layout;\n};\n\nHertz getCpuFrequency() noexcept\n{\n    MaximumFunctionNumberAndVendorId vendor{};\n    __cpuid(\u0026vendor.raw[0], MaximumFunctionNumberAndVendorId::k_leaf);\n\n    if (vendor.layout.isIntel())\n    {\n        return getIntelBaseCpuFrequency();\n    }\n    \n    return getAmdBaseCpuFrequency();\n}\n\nclass NanoWait\n{\nprivate:\n    Hertz m_frequency;\n\npublic:\n    NanoWait() noexcept\n        : m_frequency(getCpuFrequency())\n    {\n    }\n\n    void nanoWait(uint64_t nsec) const noexcept\n    {\n        const auto cyclesToWait = nsec * m_frequency / 1'000'000'000;\n\n        const auto begin = __rdtsc();\n\n        while ((__rdtsc() - begin) \u003c cyclesToWait)\n        {\n        }\n    }\n};\n```\nUsing this waiter, we get the lowest possible latency and the highest possible accuracy with the resolution of ~15 nanoseconds that are required to call `rdtsc` itself.\n\n### 🏁 Conclusion\nWe have collected all the necessary components to build our speaker synthesizer: we learned how to control the PC speaker from the usermode, figured out digital sound processing and wrote a high-performance timer. With this baggage, we are ready to sail on sound waves. In this project, you will find all the above techniques, which will give you an incredible experience when listening to music through a PC speaker.\n\nThank you for your attention and good luck!\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/thats-all.webp\"/\u003e\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhoshimin%2Fbeesynth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhoshimin%2Fbeesynth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhoshimin%2Fbeesynth/lists"}