{"id":20746869,"url":"https://github.com/gary7102/linux-get-physical-address","last_synced_at":"2026-04-24T14:32:20.831Z","repository":{"id":260487772,"uuid":"872754747","full_name":"gary7102/Linux-get-physical-address","owner":"gary7102","description":"Add a system call in Linux Kernel that get physical addresses from virtual addresses","archived":false,"fork":false,"pushed_at":"2024-11-24T04:08:45.000Z","size":146,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-18T03:08:28.938Z","etag":null,"topics":["copy-on-write","demand-paging","linux","linux-kernel","pagetable","systemcalls"],"latest_commit_sha":null,"homepage":"https://hackmd.io/39Jxfg5uTcOFEyt3T5lb0A?both=","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gary7102.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-15T02:41:35.000Z","updated_at":"2024-11-24T04:08:48.000Z","dependencies_parsed_at":"2024-11-14T16:36:32.256Z","dependency_job_id":"5357fb32-cef9-497f-81ea-9c68e277fec8","html_url":"https://github.com/gary7102/Linux-get-physical-address","commit_stats":null,"previous_names":["gary7102/linux-add-a-system-call","gary7102/linux-get-physical-address"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gary7102%2FLinux-get-physical-address","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gary7102%2FLinux-get-physical-address/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gary7102%2FLinux-get-physical-address/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gary7102%2FLinux-get-physical-address/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gary7102","download_url":"https://codeload.github.com/gary7102/Linux-get-physical-address/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243043044,"owners_count":20226754,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["copy-on-write","demand-paging","linux","linux-kernel","pagetable","systemcalls"],"created_at":"2024-11-17T08:09:38.026Z","updated_at":"2026-04-24T14:32:20.824Z","avatar_url":"https://github.com/gary7102.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003cfont color=\"#F7A004\"\u003eIntro\u003c/font\u003e\n\n**\u003cfont size = 4\u003e2024 Fall NCU Linux OS Project 1\u003c/font\u003e**  \n\n\n* Add a system call that get physical addresses from virtual addresses\n* 介紹 `copy_from_user` 及 `copy_to_user` 使用方法  \n* 使用Copy on Write 機制來驗證system call 正確呼叫  \n* 介紹 Demand Paging 在 memory 中的使用時機  \n\nDemo問題可參考[這篇](https://hackmd.io/@gary7102/ByQDR51M1e)，[github](https://github.com/gary7102/Linux-add-a-system-call.git)\n\n**\u003cfont size = 4\u003eEnvironment\u003c/font\u003e**\n```\nOS: Ubuntu 22.04\nARCH: X86_64\nKernel Version: 5.15.137\n```\n\n\n\n# \u003cfont color=\"#F7A004\"\u003e`copy_from_user` 及 `copy_to_user`\u003c/font\u003e\n\n## copy_from_user\n根據[bootlin](https://elixir.free-electrons.com/linux/v5.15.137/source/include/linux/uaccess.h#L189)\n\n```c\nunsigned long copy_from_user(void *to, const void __user *from, unsigned long n);\n```\n\n這個函數的功能是將user space的資料複製到kernel space。其中:  \n`to`: 目標位址，是kernel space中的一個指標，用來存放從user space 複製過來的資料。  \n`from`:來源位址，是user space中的一個指標，指向需要被複製的資料(ex: point to virtual address)。  \n`n`: 要傳送資料的長度  \n傳回值: 0 on success, or the number of bytes that could not be copied.  \n\n## copy_to_user\n\n```c\nunsigned long copy_to_user(void __user *to, const void *from, unsigned long n);\n```\n\n這個函數的功能是將kernel space的資料複製到user space variable。其中:  \n`to`: 目標地址(user space)  \n`from`: 複製地址(kernel space)  \n`n`: 要傳送資料的長度  \n傳回值: 0 on success, or the number of bytes that could not be copied.  \n\n## Purpose\n* Prevents crashes due to invalid memory access.\n* Maintains security by ensuring memory access respects user-space permissions.\n* Enables error handling by providing feedback on failed memory operations.\n\n:::success\n這兩個 function 都是在 kernel space 中使用\n:::\n\n## Example\n\n**\u003cfont size = 4\u003e新增一個system call 作為範例\u003c/font\u003e**\n```c=1\n#include \u003clinux/kernel.h\u003e       \n#include \u003clinux/syscalls.h\u003e     \n#include \u003clinux/uaccess.h\u003e      // For copy_from_user and copy_to_user\n\nSYSCALL_DEFINE2(get_square, int __user *, input, int __user *, output) {\n    int kernel_input;\n    int result;\n\n    // Copy the input value from user space to kernel space\n    if (copy_from_user(\u0026kernel_input, input, sizeof(int))) {\n        return -EFAULT; // Return error if copy fails\n    }\n\n    // Calculate the square\n    result = kernel_input * kernel_input;\n\n    // Copy the result back to user space\n    if (copy_to_user(output, \u0026result, sizeof(int))) {\n        return -EFAULT; // Return error if copy fails\n    }\n\n    return 0; // Return success\n}\n```\n\n**\u003cfont size = 4\u003eUser code\u003c/font\u003e**\n```c=1\n#include \u003cstdio.h\u003e\n#include \u003csys/syscall.h\u003e\n#include \u003cunistd.h\u003e\n#include \u003cerrno.h\u003e\n\nint main() {\n    int input;\n    int output;\n\n    printf(\"Enter an integer: \");\n    if (scanf(\"%d\", \u0026input) != 1) {\n        fprintf(stderr, \"Invalid input\\n\");\n        return 1;\n    }\n\n    // Call the system call with pointers to input and output\n    long result = syscall(451, \u0026input, \u0026output);\n\n    if (result == -1) {\n        perror(\"syscall failed\");\n    }else{\n        printf(\"Input: %d, Output (Square): %d\\n\", input, output);\n    }\n\n    return 0;\n}\n```\nline 17傳入`\u0026input`及`\u0026output`，  \n分別對應system call 的`int __user *, input`及`int __user *, output`，若正確複製則回傳值為0  \n而user space的`output`已經在`copy_to_user()`時寫入新資料。  \n\n\n\n**\u003cfont size = 4\u003e執行結果 :\u003c/font\u003e**  \n![image](https://hackmd.io/_uploads/HyF41Ysl1g.png)  \n\nsystem call 正確呼叫且輸出計算結果\n\n\n\n# \u003cfont color=\"#F7A004\"\u003e實作system call\u003c/font\u003e\n\n## Page Table in Linux\n\nPage table 一般來說可以分為兩種結構，32 bit cpu使用4-level(10-10-12)或是 64 bit cpu使用5-level(9-9-9-9-12，加起來只有 48 因為最高的 16 位是sign extension)的架構，但也有3-level的結構，這可以透過 config 內的 `CONFIG_PGTABLE_LEVELS` 設定，基本上是基於處理器架構在設定的\n\n- **Structure of page tables**\n    - PGD (Page Global Directory)\n    - P4D (Page 4 Directory，\u003cfont color=\"red\"\u003e5-level 才有\u003c/font\u003e)\n    - PUD (Page Upper Directory)\n    - PMD (Page Middle Directory)\n    - PTE （page table entry）\n    \n使用4-level page table 為例:  \n\n![linux_paging](https://hackmd.io/_uploads/rkIiRAVxJx.jpg)\n\n\n可以看到Page table的base address 是存放在 CR3（又稱 PDBR，page directory base register）這個register，存放的是**physical address**。但我們需要的是他的virtual address，因此，使用 `task_struct-\u003emm-\u003epgd` 內儲存的則是 Process Global Directory(PGD) 的virtual address，\n\n**補充：**\n甚麼是`task_struct`及`mm_struct`可以參考下方 [what is mm_struct?](#mm_struct)\n\n每個process有各自的page table，每當context switch發生時，CR3會載入新的page table base addr.，且CR3寫入時，TLB會被自動刷新，避免用到上一個process之TLB。\n\n因此要從logical address轉換為physical address，需要一層一層下去查表，\n順序為: `pgd_t` -\u003e `p4d_t` -\u003e `pud_t` -\u003e `pmd_t` -\u003e `pte_t`\n\n其中舉例，若要查`p4d`的base address則需要`pgd_t + p4d_index`  \n``` c=1 \npgd_t *pgd;\np4d_t *p4d;\n\npgd = pgd_offset(current-\u003emm, vaddr);\np4d = p4d_offset(pgd, vaddr);\n```\n同理，若要查`pte`的base address則需要`pmd_t + ptd_index`  \n```c\nptd_t *pte;\n\npte = pte_offset(pmd, vaddr);\n```\n\n我們可以直接到 [bootlin](https://elixir.bootlin.com/linux/v5.15.137/source/include/linux/pgtable.h#L88) 中看到這些offset function 的實作細節  \n\n```c\n// include/linux/pgtable.h line 88\n\n#ifndef pte_offset_kernel\nstatic inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)\n{\n        return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address);\n}\n#define pte_offset_kernel pte_offset_kernel\n#endif\n\n//...\n\n// line 106\n/* Find an entry in the second-level page table.. */\n#ifndef pmd_offset\nstatic inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)\n{\n        return pud_pgtable(*pud) + pmd_index(address);\n}\n#define pmd_offset pmd_offset\n#endif\n\n#ifndef pud_offset\nstatic inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)\n{\n        return p4d_pgtable(*p4d) + pud_index(address);\n}\n#define pud_offset pud_offset\n#endif\n\nstatic inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)\n{\n        return (pgd + pgd_index(address));\n};\n\n/*\n * a shortcut to get a pgd_t in a given mm\n */\n#ifndef pgd_offset\n#define pgd_offset(mm, address)\t\tpgd_offset_pgd((mm)-\u003epgd, (address))\n#endif\n```\n對應到前一張圖，找到前一層的Directory offset再加上當前Directory的 index，一層一層去找  \n\n不過發現`p4d_offset`的實作細節沒有出現在這，但是`pud_offset`傳入的參數卻是`p4d_t *p4d`，後來在`arch/x86/include/asm/pgtable.h line 926`中找到\n```c\n// arch/x86/include/asm/pgtable.h line 926\n\n/* to find an entry in a page-table-directory. */\nstatic inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)\n{\n        if (!pgtable_l5_enabled())\n                return (p4d_t *)pgd;\n        return (p4d_t *)pgd_page_vaddr(*pgd) + p4d_index(address);\n}\n```\n\n\n\n根據上述對linux中page table介紹，便可以寫出page table walk 的程式碼\n\n## Page Table walk 實作\n\n新增一個檔案叫 `project1.c`，路徑為 `kernel/project1.c`\n:::spoiler \u003cfont color = green\u003e範例\u003c/font\u003e\n\n```c\n#include \u003clinux/syscalls.h\u003e\n#include \u003clinux/mm.h\u003e\n#include \u003clinux/highmem.h\u003e\n#include \u003clinux/sched.h\u003e\n#include \u003clinux/pid.h\u003e\n#include \u003clinux/mm_types.h\u003e\n#include \u003casm/pgtable.h\u003e\n\nSYSCALL_DEFINE2(my_get_physical_addresses,\n                void *, user_vaddr, \n                unsigned long *, user_paddr) {\n    \n    unsigned long vaddr;\n    unsigned long paddr = 0;\n    pgd_t *pgd;\n    p4d_t *p4d;\n    pud_t *pud;\n    pmd_t *pmd;\n    pte_t *pte;\n    unsigned long page_addr = 0;\n    unsigned long page_offset = 0;\n\n    // Copy the virtual address from user space to kernel space\n    if (copy_from_user(\u0026vaddr, user_vaddr, sizeof(unsigned long))) {\n        printk(\"Error: Failed to copy virtual address from user space\\n\");\n        return -EFAULT;\n    }\n\n    // Get the PGD (Page Global Directory) for the current process\n    pgd = pgd_offset(current-\u003emm, vaddr);\n    if (pgd_none(*pgd) || pgd_bad(*pgd)) {\n        printk(\"PGD entry not valid or not present\\n\");\n        return -EFAULT;    // #define\tEFAULT\t\t14\t /*Bad address*/\n    }\n\n    // Get the P4D (Page 4 Directory)\n    p4d = p4d_offset(pgd, vaddr);\n    if (p4d_none(*p4d) || p4d_bad(*p4d)) {\n        printk(\"P4D entry not valid or not present\\n\");\n        return -EFAULT;\n    }\n    // Get the PUD (Page Upper Directory)\n    pud = pud_offset(p4d, vaddr);\n    if (pud_none(*pud) || pud_bad(*pud)) {\n        printk(\"PUD entry not valid or not present\\n\");\n        return -EFAULT;\n    }\n\n    // Get the PMD (Page Middle Directory)\n    pmd = pmd_offset(pud, vaddr);\n    if (pmd_none(*pmd) || pmd_bad(*pmd)) {\n        printk(\"PMD entry not valid or not present\\n\");\n        return -EFAULT;\n    }\n\n    // Get the PTE (Page Table Entry)\n    pte = pte_offset_kernel(pmd, vaddr);\n    if (!pte_present(*pte)) {\n        printk(\"Page not present in memory\\n\");\n        return -EFAULT;\n    }\n\n    // Compute physical address from PTE\n    page_addr = pte_val(*pte) \u0026 PTE_PFN_MASK;\n    page_offset = vaddr \u0026 ~PAGE_MASK;\n    paddr = page_addr | page_offset;\n\n    // Copy the result back to user space\n    if (copy_to_user(user_paddr, \u0026paddr, sizeof(unsigned long))) {\n        printk(\"Error: Failed to copy physical address to user space\\n\");\n        return -EFAULT;\n    }\n\n    return 0;\n}\n```\n:::\n\n\n\n## 地址轉換trace code:\n\n### 第一層轉換PGD:\n\u003e目標 : 回傳PGD entry的virtual address\n\n**\u003cfont size = 4\u003e程式碼:\u003c/font\u003e**\n```c\npgd = pgd_offset(current-\u003emm, vaddr);\n```\n\n**\u003cfont size = 4\u003etrace code:\u003c/font\u003e**\n\n![image](https://hackmd.io/_uploads/Bku2LGAb1e.png)\n\n![image](https://hackmd.io/_uploads/S11p8fAWJg.png)\n\n\n由`current-\u003emm-\u003epgd`找出PGD的base address再加上`pgd_index` 計算出pgd entry的虛擬位置，回傳指標。\n\n\n:::success\n### \u003cfont color= \"#008000\"\u003eHow to get `pgd_index`?\u003c/font\u003e\n根據 [bootlin](https://elixir.bootlin.com/linux/v5.15.137/source/include/linux/pgtable.h#L85) \n```c\n#ifndef pgd_index\n/* Must be a compile-time constant, so implement it as a macro */\n#define pgd_index(a)        (((a) \u003e\u003e PGDIR_SHIFT) \u0026 (PTRS_PER_PGD - 1))\n#endif\n```\n其中\n* `#define pgd_index(a)`：定義 `pgd_index` Macro，接受一個參數 `a`，代表一個virtual address\n* `(((a) \u003e\u003e PGDIR_SHIFT) \u0026 (PTRS_PER_PGD - 1))`：這是用來計算 `a` 在 PGD 中的index的表達式。\n\n**\u003cfont size = 4\u003e舉例：\u003c/font\u003e**  \n在x86_64架構的 `PGDIR_SHIFT` 為 39 (48 - 9)，\n且`PTRS_PER_PGD` 為 512，那麼 `pgd_index(a)` 的操作流程如下：\n\n* 將虛擬地址 `a` 右移 39 位，提取出對應 PGD 的高位部分\n* 將結果與 `511`（`PTRS_PER_PGD - 1`）做 bitwise `\u0026`，確保index在有效範圍內\n\n得到的結果即為 virtual address `a` 的 `pgd_index`，\n並且可以依此類推到 `p4d_index`、`pud_index`、`pmd_index`及`pte_index`的計算方法\n:::\n\n### 第二層轉換P4D(p4d僅5 level轉換時啟用，此處會值接回傳傳入的pgd *)\n**\u003cfont size = 4\u003e程式碼:\u003c/font\u003e**\n```c\np4d = p4d_offset(pgd, vaddr);\n```\n\n\n**\u003cfont size = 4\u003etrace code:\u003c/font\u003e**\n`//arch/x86/include/asm/pgtable.h line 926)`\n\n\n![image](https://hackmd.io/_uploads/HyqCLfAWke.png)\n\n\n其中`pgtable_l5_enabled()` check whether 5-level page table is enabled。因此如果系統使用的是4-level，則無需存取 `p4d_t`，且直接回傳以`(p4d_t*) pgd`，  \n也就是說在4-level下 `pgd = p4d`  \n相同道理，3-level下 `pgd = p4d = pud`\n\n\n###  第三層轉換PUD \n\u003e目標 : 使用*pgd與pud index找到之PUD entry的virtual address\n\n**\u003cfont size = 4\u003e程式碼:\u003c/font\u003e**\n```c\npud = pud_offset(p4d, vaddr);\n```\n\n**\u003cfont size = 4\u003etrace code:\u003c/font\u003e**\n```c\n//arch/x86//include/linux/pgtable.h Line:115\n#ifndef pud_offset\nstatic inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)\n{\n        return p4d_pgtable(*p4d) + pud_index(address);\n}\n```\n\n![image](https://hackmd.io/_uploads/SJlWDG0bkx.png)\n\n\n![image](https://hackmd.io/_uploads/BynZPfR-ye.png)\n\n這裡先用macro判斷CONFIG_PGTABLE_LEVELS是否大於4(p4d table是否有真正使用)\n在我們情況下使用4 level轉換，故實際function為下方349行而非337行。\n\n `/ arch / x86 / include / asm / pgtable_types.h`\n \n![image](https://hackmd.io/_uploads/Hkym9MRZJx.png)\n\n\n![image](https://hackmd.io/_uploads/ry2Q9fAZye.png)\n\n\n**此處的查詢使用的pgd entry為第一層轉換出來(p4d=pgd)，透過virtual address來指向一個pgd entry的pointer**\n\n\n\n\n\n### __va() trace code:\n\n`/ arch / x86 / include / asm / page.h`\n\n![image](https://hackmd.io/_uploads/ryNswz0Zyg.png)\n\n\n透過將physical address加上`PAGE_OFFSET`，也就是加上kernel space virtual address的啟始位置藉此得到透過偏移量轉換的virtual address.\n\n![image](https://hackmd.io/_uploads/BJDTAzA-yx.png)\n\n###  第四層轉換PMD \n\u003e目標 : 使用*pud與pmd index找到之PMD entry的virtual address\n\n**\u003cfont size = 4\u003e程式碼:\u003c/font\u003e**\n```c\npmd = pmd_offset(pud, vaddr);\n```\n\n\n\n**\u003cfont size = 4\u003etrace code:\u003c/font\u003e**\n\n```c\n//arch/x86//include/linux/pgtable.h line 106\n/* Find an entry in the second-level page table.. */\n#ifndef pmd_offset\nstatic inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)\n{\n        return pud_pgtable(*pud) + pmd_index(address);\n}\n#define pmd_offset pmd_offset\n```\n\n![image](https://hackmd.io/_uploads/HyCovzRWke.png)\n\n\n這裡傳入的pud是透過virtual address指向一個pud entry\n\n![image](https://hackmd.io/_uploads/ryxTvMAZyx.png)\n\n可以看到這裡一樣會檢查判斷CONFIG_PGTABLE_LEVELS是否大於3(pud table是否有啟用)\n\n![image](https://hackmd.io/_uploads/Bk3Tvf0bJg.png)\n\n\n這裡因為我們`CONFIG_PGTABLE_LEVELS = 4`，故執行的是363行而不是375行的`native_pud_val()`\n\n\n###  第五層轉換PTE \n\u003e目標 : 使用*pmd與pte index找到之PTE entry的virtual address\n\n**\u003cfont size = 4\u003e程式碼:\u003c/font\u003e**\n```c\npte = pte_offset_kernel(pmd, vaddr);\n```\n\n**\u003cfont size = 4\u003etrace code:\u003c/font\u003e**\n\n```c\n// include/linux/pgtable.h line 88\n\n#ifndef pte_offset_kernel\nstatic inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)\n{\n        return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address);\n}\n#define pte_offset_kernel pte_offset_kernel\n#endif\n```\n\n`/ arch / x86 / include / asm / pgtable.h`\n\n![image](https://hackmd.io/_uploads/BJ2CvfRb1x.png)\n\n\n\n\n\n\n### 由PTE table找到實體記憶體位置\n\u003e目標 : 由*pte與pte index找到pte entry中存放的physical address\n\n**\u003cfont size = 4\u003e程式碼:\u003c/font\u003e**\n```c\npage_addr = pte_val(*pte) \u0026 PTE_PFN_MASK;\npage_offset = vaddr \u0026 ~PAGE_MASK;\npaddr = page_addr | page_offset;\n```\n\n**\u003cfont size = 4\u003etrace code:\u003c/font\u003e**\n\n![image](https://hackmd.io/_uploads/Syay_GAZ1e.png)\n\n![image](https://hackmd.io/_uploads/SkQlOGAWJx.png)\n\n\n這裡透過的`pte_val()`得到pte table entry中的內容。\n\n\n\n\n\n\n\n\n## 計算physical address\n\n:::success\n**\u003cfont color = \"green\"\u003e以實際例子介紹line 64~66\u003c/font\u003e**\n\n**\u003cfont size = 4\u003e新增test.c\u003c/font\u003e**\n\n```c\n#include \u003cstdio.h\u003e\n#include \u003csys/syscall.h\u003e      /* Definition of SYS_* constants */\n#include \u003cunistd.h\u003e\n\nvoid * my_get_physical_addresses(void *vaddr_of_a){\n        unsigned long paddr;\n\n        long result = syscall(450, \u0026vaddr_of_a, \u0026paddr);\n\n        return (void *)paddr;\n};\n\nint main()\n{\n    int a = 10;\n    printf(\"Virtual addr. of arg a = %p\\n\", \u0026a);\n    printf(\"Physical addr. of arg a = %p\\n\", my_get_physical_addresses(\u0026a));\n}\n```\n\n**\u003cfont size = 4\u003e結果:\u003c/font\u003e**  \n![image](https://hackmd.io/_uploads/HyFZkBsZkx.png)\n\n\n**\u003cfont size = 4\u003e使用dmesg來查看kernel內的訊息\u003c/font\u003e**  \n\n![image](https://hackmd.io/_uploads/ryaNJriWkl.png)\n\n![image](https://hackmd.io/_uploads/r1iZv-IWJe.png)\n\n\n可以看到virtual address = `0x7fffd5bd1544`，  \n`pte_val(*pte)` = PTE base address = `0x8000000093567867`  \n另外，`PTE_PFN_MASK` = `0x0000FFFFFFFFF000`因為page size 為 4KB，且**保留了bit 12 到 51 的部分（總共 40 bit），可參考上圖**，或是最下方[physical memory範圍](##physical_memory範圍)\n```c=64\npage_addr = pte_val(*pte) \u0026 PTE_PFN_MASK;\n```\n得`page_addr` = `0x8000000093567867 ` \u0026 `0x0000FFFFFFFFF000` = `0x93567000`  \n`page_addr` 為 **base address of the physical page frame**\n\n```c=65\npage_offset = vaddr \u0026 ~PAGE_MASK;\n````\n`page_offset` = `0x7fffd5bd1544` \u0026 `0x0000000000000FFF` = `0x544`  \n得到 **physical page frame的offset**\n\n```c=66\npaddr = page_addr | page_offset;\n```\n最後 physical address = `0x93567000` | `0x544` = `0x93567544`\n    \n**\u003cfont size = 4\u003e簡單來說，其實就只是需要先算出page frame address再和offset 相加而已，只不過是使用 bitwise`\u0026` 及 `|` 來計算出結果\u003c/font\u003e**\n:::\n\n\n:::warning\n因為使用 `copy_from_user()`因此必須傳入pointer of of virtual address of `a`，  \n所以即使 `my_get_physical_addresses(void *vaddr_of_a)`中的`*vaddr_of_a`已經是pointer，  \n但是在呼叫system calls時，`long result = syscall(450, \u0026vaddr_of_a, \u0026paddr);`  \n需要傳送的參數是`\u0026vaddr_of_a`(i.e. pointer of of virtual address of `a`)\n:::\n\n\n## Add system call\n\n**\u003cfont size = 5\u003e1. Modified Makefile\u003c/font\u003e**\n\n修改 `kernel/Makefile`，增加 `project1.o`\n```\nobj-y     = fork.o exec_domain.o panic.o \\\n            cpu.o exit.o softirq.o resource.o \\\n            sysctl.o capability.o ptrace.o user.o \\\n            signal.o sys.o umh.o workqueue.o pid.o task_work.o \\\n            extable.o params.o \\\n            kthread.o sys_ni.o nsproxy.o \\\n            notifier.o ksysfs.o cred.o reboot.o \\\n            async.o range.o smpboot.o ucount.o regset.o \\\n            project1.o \\\n```\n使得在編譯時也會編譯到`project1`這個檔案\n\n**\u003cfont size = 5\u003e2. Modified syscall Table\u003c/font\u003e**\n\n要新增自己的 system call，打開`arch/x86/entry/syscalls/syscall_64.tbl`\n在第 374 行後面新增自己的 system call：\n```\n450     common  my_get_physical_addresses       sys_my_get_physical_addresses\n```\n這行有四個部分，每項之間由空白或 tab 隔開，它們代表的意義是：\n\n* `450`\nsystem call number，在使用系統呼叫時要使用這個數字\n* `common`\n支援的 ABI， 只能是 64、x32 或 common，分別表示「只支援 amd64」、「只支援 x32」或「都支援」\n* `my_get_physical_addresses`\nsystem call 的名字\n* `sys_my_get_physical_addresses`\nsystem call 對應的實作，kernel 中通常會用 sys 開頭來代表 system call 的實作\n\n`syscall_64.tbl` 這個檔案會在編譯階段被讀取後轉為 header file 檔案位於: `arch/x86/include/generated/asm/syscalls_64.h`：  \n![image](https://hackmd.io/_uploads/rJE4StogJl.png)\n\n\n**\u003cfont size = 5\u003e3. Modified `syscalls.h`\u003c/font\u003e**\n\n將 syscall 的原型添加進檔案 (`#endif` 之前)\n路徑為: `include/linux/syscalls.h`  \n\n![image](https://hackmd.io/_uploads/HyH4IFoeJg.png)\n\n這定義了我們system call的prototype，`asmlinkage`代表我們的參數都可以在stack裡取用，\n當 assembly code 呼叫 C function，並且是以 stack 方式傳參數時，在 C function 的 prototype 前面就要加上 `asmlinkage`\n\n# \u003cfont color=\"#F7A004\"\u003eCompile Kernel\u003c/font\u003e\n\n請參考 [add a system call](https://hackmd.io/aist49C9R46-vaBIlP3LDA?view)\n\n# \u003cfont color=\"#F7A004\"\u003eCopy on Write\u003c/font\u003e\n\n* **\u003cfont size = 4\u003eCopy on write:\u003c/font\u003e** allows multiple processes to share the same physical memory until one intends to modify it.\n\n![螢幕擷取畫面 2024-11-08 154515](https://hackmd.io/_uploads/Hy8Qzrsb1l.png)\n\n\n可以看到程式執行時，parent process、child process中 `global_a` 的physical memory都是共用的，直到`global_a`被改動之後，os會分配新的physical memory 給改動的process，也因此驗證了system call 確實有正確呼叫\n\n\n\n# \u003cfont color=\"#F7A004\"\u003eLoader\u003c/font\u003e\n\n進入這章節前，先快速介紹Linux 中的Demand paging機制，可以對應到老師之前介紹的lazy allocation，不過lazy allocation相對廣義一些，demand paging 單純在memory 中使用\n\n* __Demand Paging__:  pages of a process's memory are loaded into physical memory __only when they are actually needed__(ex: when the process tries to access them)\n\n簡單來說，並不是一開始所有的virtual address都有對應到physical address，而是等到需要使用(access)時才載入到physical memory\n\n因此，以\u003cfont color = \"red\"\u003e**process是否access the item**\u003c/font\u003e作為區分，可以分為下列幾種情況:\n\n## \u003cfont color = \"green\"\u003ecase 1:\u003c/font\u003e Array store in bss segment \n```c\n// global variable\nint a[2000000];   // store in bss segment,\n                  // same as  int a[2000000] = {0}; \n```\n**執行結果:**  \n![image](https://hackmd.io/_uploads/Hy2hMHjZJg.png)\n\n可以看到，存放在 bss segment 的 array，\nLoad到memory中的只有到 `a[1007]`，之後就沒有load 進memory，因此沒有分配physical memory\n\n\n\n## \u003cfont color = \"green\"\u003ecase 2:\u003c/font\u003e Array store in data segment\n```c\n// global variable\nint a[2000000] = {1};  // initialized variable, store in Data segment\n```\n**執行結果:**  \n![image](https://hackmd.io/_uploads/H1g4XBsWJe.png)\n\n\n可以看到，因為第一個element有被預設初始值，因此array `a`會預先載入幾個page至memory中，but only few page store in memory, 剩下尚未存取的需要透過page fault來載入至memory，因此印至 `a[15351]`便停止\n\n**\u003cfont size = 5\u003e補充:\u003c/font\u003e**   \n因為load至`a[15351]`，所以我想試看看預先存取`a[15352]` 產生page fault並將其load入physical memory，看看有甚麼結果\n```c\na[15352] = 1;     // occur page fault, load to phy_mem\n```\n**執行結果:**  \n![image](https://hackmd.io/_uploads/SJhooBsWJl.png)\n\n可以看到 load 到`a[16375]`結束，而`a[16376]`尚未存取，\n因此可得：\n```\n16375 - 15351 = 1024    \n```\n\n因為page size = 4KB，且一個int 4 bytes，而我們使用64位元架構，\n因此page table entries size = 8 bytes(存兩個int element = 8 bytes)，因此：$$\\dfrac{4KB}{8B} = \\dfrac{2^{12}}{2^3} = 2^9 = 512$$\n證明也是64位元架構page table entries 為512個\n\n由此證明老師上課講解的內容\n\n\n## \u003cfont color = \"green\"\u003ecase 3:\u003c/font\u003e loop through array\n\n```c\n// in local \nfor(int i=0; i\u003c2000000; i++)\n{\n    a[i] = 0;    //pre-accessing the array\n}\n```\n**執行結果:**  \n![image](https://hackmd.io/_uploads/B1CRwyBg1g.png)\n\nIn this particular case，不管是定義在Data segment or BSS segment，透過迴圈存取每個element，會造成page fault 並強迫load into memory，因此陣列中每個element 都有分配到各自的physical address\n\n\n# \u003cfont color=\"#F7A004\"\u003eNote\u003c/font\u003e\n\n## \u003cfont color = \"#008000\"\u003eBSS segment vs Data segment\u003c/font\u003e\nBSS segment 存放的資料為 **uninitialized global variable (initialized with 0)** 或是 **uninitialized static variable**，而存放在bss segment和data segment的差別可以從[case 1](##case1)及[case 2](##case2)看到，data segement中的資料會在程式載入時會**立即分配頁面**，因此分配到的記憶體更多\n```\n// global variables\n\nint a[100];               // bss segment\nint a[100] = {0};         // bss segment\nstatic int global_var2;   // bss segment\nint a[100] = {1};         // Data segment\n```\n\n```\n變數宣告位置\t       是否初始化\t  存放區段\n全域變數\t                未初始化\t.bss segment\n全域變數\t                有初始化\t.data segment\n靜態變數（static）\t 未初始化\t.bss segment\n靜態變數（static）\t 有初始化\t.data segment\n區域變數（local）\t       不論是否初始化\tStack（堆疊）區域\n```\n\n\n## \u003cfont color=\" #008000\"\u003emm_struct\u003c/font\u003e\n\n**\u003cfont size = 5\u003eWhat is `mm_struct`?\u003c/font\u003e**\n\ntask_struct 被稱為 process descriptor，因為其記錄了這個 process所有的context(ex: PID, scheduling info)，其中有一個被稱為 memory descriptor的結構 `mm_struct`，記錄了Linux視角下管理process address的資訊(ex: page tables)。  \n![30528e172c325228bf23dec7772f0c73](https://hackmd.io/_uploads/SkgMiSY1Jg.png)  \n圖源: [Linux源码解析-内存描述符（mm_struct）](https://blog.csdn.net/tiankong_/article/details/75676131)\n\n因此 `struct mm_struct *mm = current-\u003emm;` 指的是存取目前process的memory management 資訊 \n\nBy assigning `current-\u003emm` to this pointer, now can access to the memory-related information (ex: page tables) for the process that is running the system call.\n\n\n**\u003cfont size = 5\u003eWhat is `task_struct`?\u003c/font\u003e**  \n\n根據 [bootlin](https://elixir.bootlin.com/linux/v5.15.137/source/include/linux/sched.h#L721) \n\n在 Linux 中，Process Descriptor的data structure是 `task_struct`，每個正在運行或等待的process都對應一個 `task_struct`  \n\n其中比較常見的有:  \n```c\nstruct task_struct {\n    pid_t pid;                  // process ID\n    pid_t tgid;                 // thread ID\n    long state;                 // process state\n    struct mm_struct *mm;       // memory descriptor\n    struct files_struct *files; // 文件描述符\n    struct fs_struct *fs;       // 文件系統信息\n    int prio;                   // 優先級\n    struct cred *cred;          // 權限信息\n    struct signal_struct *signal; // 信號處理\n    // ... \n};\n```\n\n\n## \u003cfont color=\" #008000\"\u003eSYSCALL_DEFINE\u003c/font\u003e\n\n**\u003cfont size = 4\u003eWhat is `SYSCALL_DEFINE2`?\u003c/font\u003e**\n根據 [bootlin](https://elixir.bootlin.com/linux/v5.15.137/source/include/linux/syscalls.h#L217)定義:\n\n```c\n#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)\n#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)\n#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)\n#define SYSCALL_DEFINE4(name, ...) SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)\n#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)\n#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)\n\n#define SYSCALL_DEFINE_MAXARGS\t6\n```\n\n其中`SYSCALL_DEFINE1(name, ...)` 中的\n* `1`表示system call 參數的個數，依此類推2、3、4、5、6 表示參數個數\n* `name` 表示系統呼叫system call的名字\n\n而後面的 `SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)` 中的\n* `_##name` 是一個預處理器拼接操作，會將 `_` 和 `name` 組合成一個標識符，  \n例如，如果kernel中使用了 \n```\nSYSCALL_DEFINE1(my_get_physical_addresses, void *ptr)\n```\n則這個 Macro 會展開為：\n```\nasmlinkage long sys_my_get_physical_addresses(void *ptr);\n```\n* `__VA_ARGS__` 代表傳入的參數\n\n\n## \u003cfont color= \"#008000\"\u003eHow does the kernel set register `cr3`\u003c/font\u003e\nrefrence: [stackoverflow](https://stackoverflow.com/questions/45239165/how-does-the-kernel-set-register-cr3)\n\n\nStackoverflow Reply:\n\n    the page tables are found in kernel address space and the kernel keeps a close track of the virtual-\u003ephysical mapping there.\n\n\n    Linux differentiates between two types of virtual addresses in the kernel:\n\n    Kernel virtual addresses - which can map (conceptually) to any physical address; and\n\n    Kernel logical addresses - which are virtual addresses that have a linear mapping to physical addresses\n\n\n\n    The kernel places the page tables in logical addresses, so you only need to focus on those for this discussion.\n\n    Mapping a logical address to its corresponding physical one requires only the subtraction of a constant (see e.g. the __pa macro in the Linux source code).\n\n    For example, on x86, physical address 0 corresponds to logical address 0xC0000000, and physical address 0x8000 corresponds to logical address 0xC0008000.\n\n    So once the kernel places the page tables in a particular logical address, it can easily calculate which physical address it corresponds to.\n\n\n\n\nkernel有兩種虛擬記憶體機制:Kernel virtual addresses與Kernel logical addresses，page table存放區域使用的是Kernel logical addresses機制(簡單的偏移量關係可以實現更快速的虛擬位置實體位置轉換)。\n\n\n透過bootlin trace code:\n轉換kernel logical address至physical address是透過__pa()\n\n\n`/ arch / x86 / include / asm / page.h`\n\n![image](https://hackmd.io/_uploads/B1vQtMR-Jg.png)\n\n\n繼續進行trace code\n\n![image](https://hackmd.io/_uploads/S1QEKfCWJe.png)\n\n\npage_32.h中\n\n![image](https://hackmd.io/_uploads/SyBBKG0bJx.png)\n\n\n在32位元中__pa()透過將physical address減去PAGE_OFFSET \n\n\npage_64.h中\n\n![image](https://hackmd.io/_uploads/rJkIYfA-ke.png)\n\n\n\n![image](https://hackmd.io/_uploads/Hy48YGAWkg.png)\n\n\n\n\n\n當 x \u003c y 時，這表示 x 是一個低於內核映射起始地址的虛擬地址。這種情況下，y 會是一個負值（在無符號長整型中，這會導致進位）。為了計算出正確的物理地址，我們需要將 (__START_KERNEL_map - PAGE_OFFSET) 加到 y 上。\n\n__START_KERNEL_map 是內核映射的起始地址，而 PAGE_OFFSET 是內核虛擬地址空間的偏移量。通過將 (__START_KERNEL_map - PAGE_OFFSET) 加到 y 上，我們可以得到一個正確的物理地址，這樣可以確保計算出的物理地址是正確的。\n\n撰寫x = y + (__START_KERNEL_map - PAGE_OFFSET);是為了讓系統可以兼容使用PAGE_OFFSET機制而不是__START_KERNEL_map機制來轉換virtual address的程式\n\n\n\n## \u003cfont color= \"#008000\"\u003e`CR2`暫存器作用?\u003c/font\u003e\n\nCR2 暫存器的作用：\n\n在x86架構下，當發生頁錯（page fault）時，處理器會自動將導致頁錯的虛擬地址寫入 cr2 暫存器。這個虛擬地址就是系統試圖存取但未映射到物理內存的地址。\nPage Fault 處理流程：\n\n當頁表條目（page table entry，PTE）中的 present 位（flag）為 0 時，表示該頁未映射到物理內存，因此會觸發頁錯中斷（page fault）。\n頁錯處理程式（page fault handler）會讀取 cr2 中的虛擬地址，從而知道是哪個地址引發了頁錯。\n內核在頁錯處理過程中可能會在物理內存中找到一個可用的頁框，然後從磁碟（或其他二級存儲）將需要的頁面內容載入到這個頁框中。\n最後，內核使用 cr2 中的虛擬地址來更新相應的頁表條目，使該虛擬地址映射到剛載入的物理頁框，並將 present 標誌設為 1，以便未來的訪問不會再觸發頁錯。\n\n# \u003cfont color=\"#F7A004\"\u003eProblems\u003c/font\u003e\n\n## \u003cfont color=\"#008000\"\u003ephysical_memory範圍\u003c/font\u003e\n假設我們給予8GB 記憶體空間，那麼8GB = 8,589,934,592 Bytes = 0x2 0000 0000   \n因此最高能分配到的記憶體位置為 0x1 FFFF FFFF  \n![image](https://hackmd.io/_uploads/rJlIAySe1l.png)\n\n以上圖為例，physical address 明顯超出記憶體範圍，原因如下圖所述，並不是所有的bit都為實體記憶體位址，前面0x8000...都是NX bit或是其他功能，所以必須在計算physical address時使用`PTE_PFN_MASK`過濾掉第52 bits以上及後面12 bits，得到的才是實際physical frame number，加上offset 才會是physical address  \n\n![image](https://hackmd.io/_uploads/r1iZv-IWJe.png)\n\n\n\n\n\n# \u003cfont color=\"#F7A004\"\u003eReferenced\u003c/font\u003e\n\n* [linux系统中copy_to_user()函数和copy_from_user()函数的用法](https://blog.csdn.net/bhniunan/article/details/104088763)\n* [where is base register of page table?](https://www.csie.ntu.edu.tw/~wcchen/asm98/asm/proj/b85506061/chap2/paging.html)\n* [定址方式](https://www.csie.ntu.edu.tw/~wcchen/asm98/asm/proj/b85506061/chap2/overview.html)\n* [實作一個回傳物理位址的系統呼叫](https://hackmd.io/@Mes/make_phy_addr_syscall#%E4%BF%AE%E6%94%B9-syscall_64tbl)\n* [add a system call to kernel (v5.15.137)](https://hackmd.io/aist49C9R46-vaBIlP3LDA?view#add-a-system-call-to-kernel-v515137)\n* [Kernel 的替換 \u0026 syscall 的添加](https://satin-eyebrow-f76.notion.site/Kernel-syscall-3ec38210bb1f4d289850c549def29f9f)\n* [關於Linux尋址及page table的一些細節](https://www.cnblogs.com/QiQi-Robotics/p/15630380.html)\n* [SYSCALL_DEFINEx宏源码解析](https://blog.csdn.net/qq_41345173/article/details/104071618)\n* [Linux Kernel](https://hackmd.io/@eugenechou/H1LGA9AiB#Project-1)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgary7102%2Flinux-get-physical-address","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgary7102%2Flinux-get-physical-address","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgary7102%2Flinux-get-physical-address/lists"}