Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/trou/apache-module-ida-til

Making Type Info Library (TIL) file for Apache modules
https://github.com/trou/apache-module-ida-til

Last synced: 2 months ago
JSON representation

Making Type Info Library (TIL) file for Apache modules

Awesome Lists containing this project

README

        

# Creating TIL files for IDA

## Intro

Creating a *Type Information Library* makes it easier to reverse engineer
binaries by providing IDA with detailed and acurate information about types.

Types include:

* function prototypes
* structures
* enums

The main point is that IDA will apply function prototypes to the imports
and include the relevant data types in the database.

## Creating a TIL file for Apache

As an example, we will create a TIL file which can help reversing Apache modules.

Everything here will be done on a Debian Sid, amd64 from March 2021, but most of
it will work on most Linux distros.

### Prerequisites

We need the source code of the libraries we want to analyze. My target used
Apache 2.2, so let's fetch it:

```
wget https://archive.apache.org/dist/httpd/httpd-2.2.34.tar.bz2
wget https://archive.apache.org/dist/httpd/httpd-2.2.34.tar.bz2.asc
curl https://downloads.apache.org/httpd/KEYS | gpg2 --import
gpg2 --verify httpd-2.2.34.tar.bz2.asc httpd-2.2.34.tar.bz2
```

The archive contains things we want to include in our TIL:

* the headers for writing modules
* the Apache Runtime (apr) lib

First, we need to do a `./configure` to have the right headers generated.
Of course, this phase will need to reflect the configuration that was used
by your target.

In my case, the binary was compiled with `GCC: (GNU) 3.2.3 20030502 (Red Hat Linux 3.2.3-56)`, which
is *ancient*. But in theory, there should not be real differences in ABI between a recent and old
GCC compiler on Linux amd64, so let's proceed anyway.

## TIL

### Compiler config

First, we need to get the right configuration for the compiler options in `tilib`: depending on
the architecture and target ABI, the structures padding, type sizes, etc. will vary.

This is the "documentation":

```
$ ./tilib -C?
-C... specifies the compiler information
It has the -Cx# form, where # - value, x is one of the following:
c-compiler id, m-model, p-sizeof(near*), g-defalign (0/1/2/4/8/6 for16)
b-sizeof(bool), e-sizeof(enum), i-sizeof(int), s-sizeof(short)
l-sizeof(long), L-sizeof(longlong), R-explicit stack offsets
v-calling convention, B-bitness (3 for 32 or 6 for 64), D-sizeof(long double)
8-4 byte alignment for 8byte scalars (__int64/double) inside structures (y/n)
a-shorthand for cmpgbeislLvB8. The default is us40144248i3n
Compiler ids: Pointer sizes:
0 or u: Unknown 1: sizeof(near*)=1, sizeof(far*)=2
1 or v: Visual C++ 2: sizeof(near*)=2, sizeof(far*)=4
2 or b: Borland C++ 4: sizeof(near*)=4, sizeof(far*)=6
3 or w: Watcom C++ 8: sizeof(near*)=8, sizeof(far*)=8
6 or g: GNU C++ Memory models:
7 or a: Visual Age C++ s: small (code=near, data=near)
8 or d: Delphi l: large (code=far, data=far)
c: compact (code=near, data=far)
m: medium (code=far, data=near)
Calling conventions:
i: invalid s: stdcall u: unknown (default)
v: void p: pascall
c: cdecl r: fastcall
e: (...) t: thiscall
For example, BCC small model v3.1: -Cabs2122224
GNU C++: -Cags44444248u
```

As you can see, `-C` is *difficult* to master. Here's how to read the
-`Cags44444` which you can find in tilib's `gcc.cfg`:

```
; from GCC 32 config:
; -Cags44444
; cmpgbeislLvB8 (expansion for for "Ca")
; us40144248i3n (default)
; gs44444
; |||||||||||||_ 8bytes scalars alignment
; ||||||||||||__ bitness
; |||||||||||___ calling convention
; ||||||||||____ sizeof(longlong)
; |||||||||_____ sizeof(long) :
; ||||||||______ sizeof(short) : 4
; |||||||_______ sizeof(int) : 4
; ||||||________ sizeof(enum) : 4
; |||||_________ sizeof(bool) : 4
; ||||__________ defalign: 4
; |||___________ pointer size: 4
; ||____________ mem model: small
; |_____________ compiler: gcc
```

#### Creating our own config

* Use `sizes.c`
* `cp gcc.cfg gcc64.cfg`
* Update `gcc64.cfg`

**Note:** the (updated) `gcc64.cfg` was provided by Igor Skochinsky from Hex-Rays, I just added the comments.

#### Building TIL steps

First we need to make a top level header which includes everything: `apache_all.h`.

Then, we will preprocess it using `gcc -E` to preprocess everything and facilitate
the ingestion by `tilib`.

Then we begin the loop of fixing errors and warnings.

The most important hacks are:

* Adding `#define __asm__(arg)` to our `apache_all.h` file, to "nop" inline asm
* Adding `-D__extension__= \` to the `tilib` call, which will "nop" the unsupported `__extension__` keyword
* Adding `"-D__builtin_va_list=void *"` which will work around the need for the internal definition of `va_list`
* Add `-D__UNKNOWN_ATTR__=UNKNOWN_ATTR` in `gcc64.cfg`

Of course the command line options could be included in the `.cfg` file.

See `make_til.sh` for the final result.

#### Fixing "opaque" structures

Identify which structures have no "size" in the .til file:

```
$ tilib -l apache22-debian64.til | grep "FFFFFFFF struct"
[...]
FFFFFFFF struct ap_conf_vector_t;
FFFFFFFF struct ap_filter_provider_t;
FFFFFFFF struct apr_allocator_t;
FFFFFFFF struct apr_bucket_alloc_t;
[...]
```

some are opaque by "design", such as `ap_conf_vector_t`, others should be added
in the `apache_all.h` file by copy pasting.

# Result

The TIL file should be put inside `til/pc` in IDA dir to be discovered.

After loading the TIL file (Shift-F11, Insert), and defining the module export as `module`, note
how all the Apache related imports are now in **bold**, with their types defined:
![Before / After](img/before_after1.png)