Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/madsys-dev/syscall_intercept

The system call intercepting library
https://github.com/madsys-dev/syscall_intercept

Last synced: about 2 months ago
JSON representation

The system call intercepting library

Awesome Lists containing this project

README

        

# syscall_intercept

[![Build Status](https://travis-ci.org/pmem/syscall_intercept.svg)](https://travis-ci.org/pmem/syscall_intercept)
[![Coverage Status](https://codecov.io/github/pmem/syscall_intercept/coverage.svg)](https://codecov.io/gh/pmem/syscall_intercept)
[![Coverity Scan Build Status](https://scan.coverity.com/projects/12890/badge.svg)](https://scan.coverity.com/projects/syscall_intercept)

Userspace syscall intercepting library.

# Dependencies #

## Runtime dependencies ##

* libcapstone -- the disassembly engine used under the hood

## Build dependencies ##

# Local build dependencies #

* C99 toolchain -- tested with recent versions of GCC and clang
* cmake
* perl -- for checking coding style
* pandoc -- for generating the man page

### Travis CI build dependencies ###

The travis builds use some scripts to generate a docker images, in which syscall_intercept is built/tested.
These docker images are pushed to Dockerhub, to be reused in later travis builds.
The scripts expect four environment variables to be set in the travis environment:
* DOCKERHUB_REPO - where to store the docker images used for building
e.g. in order to refer to a Dockerhub repository at https://hub.docker.com/r/pmem/syscall_intercept, this variable
should contain the string "pmem/syscall_intercept"
* DOCKERHUB_USER - used for logging into Dockerhub
* DOCKERHUB_PASSWORD - used for logging into Dockerhub
* GITHUB_REPO - where the repository is available on github (e.g. "pmem/syscall_intercept" )

### How to build ###

Building libsyscall_intercept requires cmake.
Example:
```sh
cmake path_to_syscall_intercept -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang
make
```
alternatively:
```sh
ccmake path_to_syscall_intercept
make
```

There is an install target. For now, all it does, is cp.
```sh
make install
```

Coming soon:
```sh
make test
```

# Synopsis #

```c
#include
```
```sh
cc -lsyscall_intercept -fpic -shared source.c -o preloadlib.so

LD_PRELOAD=preloadlib.so ./application
```

##### Description: #####

The system call intercepting library provides a low-level interface
for hooking Linux system calls in user space. This is achieved
by hotpatching the machine code of the standard C library in the
memory of a process. The user of this library can provide the
functionality of almost any syscall in user space, using the very
simple API specified in the libsyscall_intercept\_hook\_point.h header file:
```c
int (*intercept_hook_point)(long syscall_number,
long arg0, long arg1,
long arg2, long arg3,
long arg4, long arg5,
long *result);
```

The user of the library shall assign to the variable called
intercept_hook_point a pointer to the address of a callback function.
A non-zero return value returned by the callback function is used
to signal to the intercepting library that the specific system
call was ignored by the user and the original syscall should be
executed. A zero return value signals that the user takes over the
system call. In this case, the result of the system call
(the value stored in the RAX register after the system call)
can be set via the *result pointer. In order to use the library,
the intercepting code is expected to be loaded using the
LD_PRELOAD feature provided by the system loader.

All syscalls issued by libc are intercepted. Syscalls made
by code outside libc are not intercepted. In order to
be able to issue syscalls that are not intercepted, a
convenience function is provided by the library:
```c
long syscall_no_intercept(long syscall_number, ...);
```

Three environment variables control the operation of the library:

*INTERCEPT_LOG* -- when set, the library logs each syscall intercepted
to a file. If it ends with "-" the path of the file is formed by appending
a process id to the value provided in the environment variable.
E.g.: initializing the library in a process with pid 123 when the
INTERCEPT_LOG is set to "intercept.log-" will result in a log file named
intercept.log-123.

*INTERCEPT_LOG_TRUNC -- when set to 0, the log file from INTERCEPT_LOG
is not truncated.

*INTERCEPT_HOOK_CMDLINE_FILTER* -- when set, the library
checks the command line used to start the program.
Hotpatching, and syscall intercepting is only done, if the
last component of the command used to start the program
is the same as the string provided in the environment variable.
This can also be queried by the user of the library:
```c
int syscall_hook_in_process_allowed(void);
```

##### Example: #####

```c
#include
#include
#include

static int
hook(long syscall_number,
long arg0, long arg1,
long arg2, long arg3,
long arg4, long arg5,
long *result)
{
if (syscall_number == SYS_getdents) {
/*
* Prevent the application from
* using the getdents syscall. From
* the point of view of the calling
* process, it is as if the kernel
* would return the ENOTSUP error
* code from the syscall.
*/
*result = -ENOTSUP;
return 0;
} else {
/*
* Ignore any other syscalls
* i.e.: pass them on to the kernel
* as would normally happen.
*/
return 1;
}
}

static __attribute__((constructor)) void
init(void)
{
// Set up the callback function
intercept_hook_point = hook;
}
```

```sh
$ cc example.c -lsyscall_intercept -fpic -shared -o example.so
$ LD_LIBRARY_PATH=. LD_PRELOAD=example.so ls
ls: reading directory '.': Operation not supported
```

# Under the hood: #

##### Assumptions: #####
In order to handle syscalls in user space, the library relies
on the following assumptions:

- Each syscall made by the applicaton is issued via libc
- No other facility attempts to hotpatch libc in the same process
- The libc implementation is already loaded in the processes
memory space when the intercepting library is being initialized
- The machine code in the libc implementation is suitable
for the methods listed in this section
- For some more basic assumptions, see the section on limitations.

##### Disassembly: #####
The library disassembles the text segment of the libc loaded
into the memory space of the process it is initialized in. It
locates all syscall instructions, and replaces each of them
with a jump to a unique address. Since the syscall instruction
of the x86_64 ISA occupies only two bytes, the method involves
locating other bytes close to the syscall suitable for overwriting.
The destination of the jump (unique for each syscall) is a
small routine, which accomplishes the following tasks:

1. Optionally executes any instruction that originally
preceded the syscall instruction, and was overwritten to
make space for the jump instruction
2. Saves the current state of all registers to the stack
3. Translates the arguments (in the registers) from
the Linux x86_64 syscall calling convention to the C ABI's
calling convention used on x86_64
4. Calls a function written in C (which in turn calls
the callback supplied by the library user)
5. Loads the values from the stack back into the registers
6. Jumps back to libc, to the instruction following the
overwritten part

##### In action: #####

*Simple hotpatching:*
Replace a mov and a syscall instruction with a jmp instruction
```
Before: After:

db2a0 <__open>: db2b0 <__open>:
db2aa: mov $2, %eax /-db2aa: jmp e0000
db2af: syscall |
db2b1: cmp $-4095, %rax | db2b1: cmp $-4095, %rax ---\
db2b7: jae db2ea | db2b7: jae db2ea |
db2b9: retq | db2b9: retq |
| ... |
| ... |
\_... |
e0000: mov $2, $eax |
... |
e0100: call implementation /
... /
e0200: jmp db2aa ________/
```
*Hotpatching using a trampoline jump:*
Replace a syscall instruction with a short jmp instruction,
the destination of which is a regular jmp instruction.
The reason to use this, is that a short jmp instruction
consumes only two bytes, thus fits in the place of a syscall
instruction. Sometimes the instructions directly preceding
or following the syscall instruction can not be overwritten,
leaving only the two bytes of the syscall instruction
for patching.
The hotpatching library looks for place for the trampoline jump
in the padding found to the end of each routine. Since the start
of all routines is aligned to 16 bytes, often there is a padding
space between the end of a symbol, and the start of the next symbol.
In the example below, this padding is filled with 7 byte long
nop instruction (so the next symbol can start at the address 3f410).
```
Before: After:

3f3fe: mov %rdi, %rbx 3f3fe: mov %rdi, %rbx
3f401: syscall /-3f401: jmp 3f430
3f403: jmp 3f415 | 3f403: jmp 3f415 ----------\
3f407: retq | 3f407: retq |
\ |
3f408: nopl 0x0(%rax,%rax,1) /-3f408: jmp e1000 |
| ... |
| ... |
\_... |
e1000: nop |
... |
e1100: call implementation /
... /
e1200: jmp 3f403 ________/

```

# Limitations: #
* Only Linux is supported
* Only x86\_64 is supported
* Only tested with glibc, although perhaps it works
with some other libc implementations as well
* There are known issues with the following syscalls:
* clone
* rt_sigreturn

# Debugging: #
Besides logging, the most important factor during debugging is to make
sure the syscalls in the debugger are not intercepted. To achieve this, use
the INTERCEPT_HOOK_CMDLINE_FILTER variable described above.

```
INTERCEPT_HOOK_CMDLINE_FILTER=ls \
LD_PRELOAD=libsyscall_intercept.so \
gdb ls
```

With this filtering, the intercepting library is not activated in the gdb
process itself.