{"id":21940367,"url":"https://github.com/eecheng87/esca","last_synced_at":"2025-04-22T15:43:18.118Z","repository":{"id":37926729,"uuid":"339583140","full_name":"eecheng87/ESCA","owner":"eecheng87","description":"Effective System Call Aggregation","archived":false,"fork":false,"pushed_at":"2022-11-03T15:55:27.000Z","size":1912,"stargazers_count":38,"open_issues_count":8,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-14T03:52:45.919Z","etag":null,"topics":["kernel-module","linux-kernel"],"latest_commit_sha":null,"homepage":"https://eecheng87.github.io/ESCA/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eecheng87.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-17T01:48:29.000Z","updated_at":"2024-05-27T18:20:29.000Z","dependencies_parsed_at":"2022-07-07T23:13:07.750Z","dependency_job_id":null,"html_url":"https://github.com/eecheng87/ESCA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eecheng87%2FESCA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eecheng87%2FESCA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eecheng87%2FESCA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eecheng87%2FESCA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eecheng87","download_url":"https://codeload.github.com/eecheng87/ESCA/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250270201,"owners_count":21403008,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["kernel-module","linux-kernel"],"created_at":"2024-11-29T02:32:38.081Z","updated_at":"2025-04-22T15:43:18.085Z","avatar_url":"https://github.com/eecheng87.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Effective System Call Aggregation (ESCA)\n\nThe main objective of this work was to reduce the per-syscall overhead through the use of effective syscall aggregation.\nFor that purpose, ESCA takes advantages of system call batching and exploits the parallelism of event-driven applications by leveraging Linux I/O model to overcome the disadvantages of previous solutions.\n\nESCA is capable of reducing the per-syscall overhead by up to 62% for embedded web servers.\nReal-world highly concurrent event-driven applications such as Nginx and Redis are known to benefit from ESCA, along with full compatibility with Linux syscall semantics and functionalities.\n\n## Prerequisite\nFor Nginx and wrk:\n```shell\nsudo apt install build-essential libpcre3 libpcre3-dev zlib1g zlib1g-dev\nsudo apt install libssl-dev libgd-dev libxml2 libxml2-dev uuid-dev\nsudo apt install autoconf automake libtool\n```\n\n## Download project\n```shell\ngit clone https://github.com/eecheng87/ESCA\ncd ESCA\n```\n\n## Build from source\nCompile files under directory `lkm/` and `wrapper/` (The default target is lwan)\n```shell\nmake TARGET=\u003cnginx | lighttpd | lwan\u003e\n```\n\n### Build adaptation target\nBuild `wrk`\n```shell\nmake wrk\n```\n\nDownload and build nginx\n```shell\nmake nginx\n```\n\nDownload and build lighttpd\n```shell\nmake lighttpd\n```\n\nDownload and build lwan\n```shell\nmake lwan\n```\n\n## Testing\n\n### Launch Nginx\nChoose either\n```shell\nmake nginx-launch # origin nginx\n```\nor\n\n```shell\nmake load-lkm\nmake nginx-esca-launch # nginx-esca\n```\n\n### Launch lighttpd\nChoose either\n```shell\nmake lighttpd-launch # origin lighttpd\n```\nor\n\n```shell\nmake load-lkm\nmake lighttpd-esca-launch # lighttpd-esca\n```\n\n### Launch lwan\nChoose either\n```shell\nmake lwan-launch # origin lwan\n```\nor\n\n```shell\nmake load-lkm\nmake lwan-esca-launch # lwan-esca\n```\n\n### Download workloads\n```shell\ngit submodule init\ngit submodule update\n```\n\n### Benchmarking\n```shell\n# nginx is at port 8081; lighttpd is at port 3000; lwan is at port 8080\ndownloads/wrk-master/wrk -c 50 -d 5s -t 4 http://localhost:8081/a20.html\n```\n\n### Demo\n![image](assets/demo.gif)\n\nNginx-ESCA led by about 11% over vanilla Nginx.\n\n![image](assets/light-demo.gif)\n\nlighttpd-ESCA led by about 13% over vanilla lighttpd.\n\n![image](assets/lwan-demo.gif)\n\nlwan-ESCA led by about 30% over vanilla lwan.\n\n## Technical Description\nThe code section enclosed by `batch_start()` and `batch_flush()` is called batching segment.\nIt can appear more than one time in a single application.\nCompared with typical syscalls, ESCA eliminates mode switches in batching segments by decoupling syscalls.\nInstead of switching to the kernel or executing the corresponding service routine, syscalls in batching segment only record their syscall ID and arguments in the shared table.\nAfter `batch_flush` is invoked, ESCA finally switches to kernel mode, executes all syscalls in the shared table, and then switches back to user mode.\n\n### Typical system call flow\n![image](assets/syscall-flow.png)\n\n1. User application call system call\n2. From user mode switches to kernel mode by an interrupt\n3. Search interrupt vector table and call interrupt service routine(ISR)\n4. The corresponding interrupt service routine would search system call table\n5. Call system call service routine\n6. After finishing system call service routine, switch back to user mode\n\n### System Call wrappers\nBecause system call is invoked by assembly routines, wrapping the assembly code with the wrapper function would increase readability of the program.\ne.g.,\n```c\n#define SYSCALL(name, a1, a2, a3, a4, a5, a6)                             \\\n    ({                                                                    \\\n        long result;                                                      \\\n        long __a1 = (long) (a1), __a2 = (long) (a2), __a3 = (long) (a3);  \\\n        long __a4 = (long) (a4), __a5 = (long) (a5), __a6 = (long) (a6);  \\\n        register long _a1 asm(\"rdi\") = __a1;                              \\\n        register long _a2 asm(\"rsi\") = __a2;                              \\\n        register long _a3 asm(\"rdx\") = __a3;                              \\\n        register long _a4 asm(\"r10\") = __a4;                              \\\n        register long _a5 asm(\"r8\") = __a5;                               \\\n        register long _a6 asm(\"r9\") = __a6;                               \\\n        asm volatile(\"syscall\\n\\t\"                                        \\\n                     : \"=a\"(result)                                       \\\n                     : \"0\"(name), \"r\"(_a1), \"r\"(_a2), \"r\"(_a3), \"r\"(_a4), \\\n                       \"r\"(_a5), \"r\"(_a6)                                 \\\n                     : \"memory\", \"cc\", \"r11\", \"cx\");                      \\\n        (long) result;                                                    \\\n    })\n\n#define SYSCALL1(name, a1) SYSCALL(name, a1, 0, 0, 0, 0, 0)\n#define SYSCALL2(name, a1, a2) SYSCALL(name, a1, a2, 0, 0, 0, 0)\n#define SYSCALL3(name, a1, a2, a3) SYSCALL(name, a1, a2, a3, 0, 0, 0)\n#define SYSCALL4(name, a1, a2, a3, a4) SYSCALL(name, a1, a2, a3, a4, 0, 0)\n#define SYSCALL5(name, a1, a2, a3, a4, a5) SYSCALL(name, a1, a2, a3, a4, a5, 0)\n#define SYSCALL6(name, a1, a2, a3, a4, a5, a6) \\\n    SYSCALL(name, a1, a2, a3, a4, a5, a6)\n\n/* wrapper function */\nstatic inline void *brk(void *addr)\n{\n    return (void *) SYSCALL1(__NR_brk, addr);\n}\n```\n\n### System Call Hooks\nESCA locates the address of the syscall table through the kernel symbol table and replace the syscall table entry with our customized handler.\nAlso, it is necessary to clear the write protection bit of the control register if modifying the syscall table is required.\n\nTwo system calls ESCA intercepts\n* `sys_batch`: iterates shared table and execute all syscalls recorded in the shared table, and then switches back to user mode.\n* `sys_register`: maps userspace shared table to kernel space memory and initialization.\n\nReplace system call handlers:\n```c\n// find out syscall table address\nscTab = (void **) (smSCTab + ((char *) \u0026system_wq - smSysWQ));\n// clear write protection bit\nallow_writes();\n\n/* backup original system call service routine */\nsys_oldcall0 = scTab[__NR_batch_flush];\nsys_oldcall1 = scTab[__NR_register];\n\n/* hooking */\nscTab[__NR_batch_flush] = sys_batch;\nscTab[__NR_register] = sys_register;\n\n// set write protection bit\ndisallow_writes();\n```\n\n### Share the same physical address space between kernel and user space\nESCA deploys `get_user_pages` to get the physical page address which the userspace memory page corresponds to, and utilizes `kmap` to map the physical pages to the kernel address space.\nIn this way, data sharing is without data copy, and the procedure is a one-time allocation.\n\n* `batch_register` syscall maps userspace shared table to kernel space memory and initialization.\n```c\nasmlinkage long sys_register(const struct pt_regs *regs)\n{\n    int n_page, i, j;\n    unsigned long p1 = regs-\u003edi;\n\n    /* map batch table from user-space to kernel */\n    n_page = get_user_pages(\n        (p1),           /* Start address to map */\n        MAX_THREAD_NUM, /* Number of pinned pages. 4096 btyes in this machine */\n        FOLL_FORCE | FOLL_WRITE, /* Force flag */\n        pinned_pages,            /* struct page ** pointer to pinned pages */\n        NULL);\n\n    for (i = 0; i \u003c MAX_THREAD_NUM; i++)\n        batch_table[i] = (struct batch_entry *) kmap(pinned_pages[i]);\n\n    /* initial table status */\n    for (j = 0; j \u003c MAX_THREAD_NUM; j++)\n        for (i = 0; i \u003c MAX_ENTRY_NUM; i++)\n            batch_table[j][i].rstatus = BENTRY_EMPTY;\n\n    global_i = global_j = 0;\n\n    main_pid = current-\u003epid;\n\n    return 0;\n}\n```\n\n### Change the typical system call behavior\nTo change the behavior of the syscall, when the application is executed, the syscall wrapper of glibc is replaced with our shared library through `LD_PRELOAD`.\nCustomized syscall wrapper will determine if the system call is in the `batch segment`.\n1. Out of `batch segment`: call original glibc syscall wrapper we backup.\n2. In `batch segment`: Record syscall ID and arguments in the shared table.\n\nThe function `dlsym()` takes a \"handle\" of a dynamic library returned by `dlopen()` and the null-terminated symbol name, returning the address where that symbol is loaded into memory.\n\nBecause the system call handler is dynamically linked to the customized system call handler during execution, we should backup original glibc system call handler by `dlsym()`.\n```c\n__attribute__((constructor)) static void setup(void)\n{\n    pgsize = getpagesize();\n    in_segment = 0;\n    batch_num = 0;\n\n    /* store glibc function */\n    real_writev = real_writev ? real_writev : dlsym(RTLD_NEXT, \"writev\");\n    real_shutdown =\n        real_shutdown ? real_shutdown : dlsym(RTLD_NEXT, \"shutdown\");\n    real_sendfile =\n        real_sendfile ? real_sendfile : dlsym(RTLD_NEXT, \"sendfile\");\n    real_send =\n        real_send ? real_send : dlsym(RTLD_NEXT, \"send\");\n\n    global_i = global_j = 0;\n}\n```\n\n## Citation\n\nPlease see our [PDP 2022](https://pdp2022.infor.uva.es/) paper, available in the [IEEE Xplore](https://ieeexplore.ieee.org/abstract/document/9756707) digital library, and you can get a [preprint copy](https://eecheng87.github.io/ESCA/main.pdf).\n\nIf you find this work useful in your research, please cite:\n```\n@inproceedings{cheng2022esca,\n    author={Cheng, Yu-Cheng and Huang, Ching-Chun (Jim) and Tu, Chia-Heng},\n    booktitle={2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)},\n    title={ESCA: Effective System Call Aggregation for Event-Driven Servers},\n    year={2022},\n    pages={18-25},\n    doi={10.1109/PDP55904.2022.00012}\n}\n```\n\n## License\n\n`ESCA` is released under the MIT license. Use of this source code is governed by\na MIT-style license that can be found in the LICENSE file.\n\n## Reference\n* B. M. Michelson, \"Event-driven architecture overview,\" Patricia Seybold Group, vol. 2, no. 12, pp. 10–1571, 2006.\n* A. S. Rahul Jadhav, Zhen Cao, \"Improved system call batching for network I/O,\" 2019.\n* A. Purohit, J. Spadavecchia, C. Wright, and E. Zadok, \"Improving application performance through system call composition,\" 2003.\n* M. Rajagopalan, S. K. Debray, M. A. Hiltunen, and R. D. Schlichting, \"System call clustering: A profile-directed optimization technique,\" 2002.\n* D. Hansen, [KAISER: unmap most of the kernel from userspace page tables](https://lwn.net/Articles/738997/), 2017.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feecheng87%2Fesca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feecheng87%2Fesca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feecheng87%2Fesca/lists"}