Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dabeaz/bitey


https://github.com/dabeaz/bitey

Last synced: 24 days ago
JSON representation

Awesome Lists containing this project

README

        

Bitey - Bitcode Import Tool
===========================
Import LLVM bitcode directly into Python and use it as an extension module.

Warning
=======
THIS IS PROOF-OF-CONCEPT SOFTWARE THAT DESPITE ITS CUTE NAME, MIGHT
ACTUALLY BITE YOUR ENTIRE LEG OFF. USE AT YOUR OWN RISK!

Requirements
============
You'll need to have a pretty complete LLVM development environment
installed on your machine. Bitey has been developed using
LLVM/Clang-3.1. You might need to install it yourself.

In addition, you need to install the ``llvm-py`` extension. Get it at
at http://www.llvmpy.org.

Bitey is unlikely to work with any older version of LLVM or the llvm-py
extension--especially preinstalled versions distributed with your
operating system. You need to be using bleeding-edge modern versions
of these libraries.

Example and Basic Tutorial
==========================
First, you need some C code. Something important like computing a
fibonacci number::

/* fib.c */
int fib(int n) {
if (n < 3) {
return 1;
} else {
return fib(n-1) + fib(n-2);
}
}

Now, compile it into LLVM bitcode using clang::

bash % clang -emit-llvm -c fib.c

This makes an object file ``fib.o`` as usual--only the .o file contains
LLVM bitcode. Now, just import it into Python::

>>> import bitey
>>> import fib
>>> fib.fib(38)
39088169
>>>

Yes, that's it. Bitey does not use the C compiler, the linker, or the
dynamic loader. You don't write wrapper functions either.
Write normal C, compile it with clang, and import it. Done.

Bitey understands most basic C datatypes including integers, floats,
void, pointers, arrays, and structures. Because it builds a ctypes
based interface, you would access the code using the same
techniques. Here is an example that mutates a value through a
pointer::

/* mutate.c */

void mutate_int(int *x) {
*x *= 2;
}

Here's how you would use this::

% clang -emit-llvm -c mutate.c
% python
>>> import bitey
>>> import mutate
>>> import ctypes
>>> x = ctypes.c_int(2)
>>> mutate.mutate_int(x)
>>> x.value
4
>>>

Here is an example involving a structure::

/* point.c */
#include

struct Point {
double x;
double y;
};

double distance(struct Point *p1, struct Point *p2) {
return sqrt((p1->x - p2->x)*(p1->x - p2->x) +
(p1->y - p2->y)*(p1->y - p2->y));
}

To run::

% clang -emit-llvm -c point.c
% python
>>> import bitey
>>> import point
>>> p1 = point.Point(3,4)
>>> p2 = point.Point(6,8)
>>> point.distance(p1,p2)
5.0
>>>

One subtle issue with structure wrapping is that LLVM bitcode doesn't
encode the names of structure fields. So, Bitey simply assigns them
to an indexed element variable like this::

>>> p1.e0 # (Returns the .x component)
3
>>> p1.e1 # (Returns the .y component)
4

This can be fixed using a pre-load module as described in the
"Advanced Topics" section later.

If you need to combine two LLVM object files together into a single
importable module, use ``llvm-ld`` like this::

% llvm-ld point.o fib.o -b combined.o
% python
>>> import bitey
>>> import combined
>>> combined.fib(10)
55
>>> p1 = combined.Point(3,4)
>>> p2 = combined.Point(6,8)
>>> combined.distance(p1,p2)
5.0
>>>

The C code you write can link with external libraries, but you might
need to take special steps to load the library prior to import. For
example, suppose you compiled the Fibonacci code into a shared library
like this::

# OS-X
% gcc -bundle -export_dynamic fib.c -o libfib.so

# Linux
% gcc -shared fib.c -o libfib.so

Now, suppose you had some C code that wanted to access this library::

/* sample.c */
#include
extern int fib(int n);

void print_fib(int n) {
while (n > 0) {
printf("%d\n", fib(n));
n--;
}
}

If you try to build it normally, you'll get an error::

% clang -emit-llvm -c sample.c
% python
>>> import bitey
>>> import sample
LLVM ERROR: Program used external function 'fib' which could not be resolved!
%

However, you can load the library yourself doing this::

% python
>>> import bitey
>>> bitey.load_library("./libfib.so")

>>> import sample
>>> sample.print_fib(10)
55
34
21
13
8
5
3
2
1
1
>>>

It is important to note that Bitey is NOT a wrapper generator meant to
access already-compiled C libraries. It only exposes functionality
that has been explicitly compiled as LLVM bitcode. To access the
contents of a library, you would need to compile and link it using
``clang`` and ``llvm-ld`` as shown in the examples.

How it works
============
Bitey extends Python with an import hook that looks for ``.o`` files
containing LLVM bitcode. Type signatures and other information in the
bitcode are then used to build a ctypes-based binding to the natively
compiled functions contained within an LLVM execution engine. It's
all a bit magical, but the LLVM JIT generates the executable code
whereas Bitey makes the ``ctypes`` binding to it---all behind the
scenes on import.

It's important to stress that Bitey does not use the C compiler, the
linker, the dynamic loader, or make calls to subprocesses. It is
completely self-contained and only uses the functionality of
``llvm-py`` and ``ctypes``.

Performance
===========
The performance profile of Bitey is going to be virtually identical
that of using ``ctypes``. LLVM bitcode is translated to native
machine code and Bitey builds a ``ctypes``-based interface to it
in exactly the same manner as a normal C library.

As a performance experiment, here is a simple C function that checks
if a number is prime or not::

int isprime(int n) {
int factor = 3;
/* Special case for 2 */
if (n == 2) {
return 1;
}
/* Check for even numbers */
if ((n % 2) == 0) {
return 0;
}
/* Check for everything else */
while (factor*factor < n) {
if ((n % factor) == 0) {
return 0;
}
factor += 2;
}
return 1;
}

Try compiling this code into LLVM and a C shared library::

% clang -O3 -emit-llvm -c isprime.c

# OS-X
% gcc -O3 -bundle -undefined dynamic_lookup isprime.c -o isprime.so

# Linux
% gcc -O3 -shared isprime.c -o isprime.so

Now, let's put Bitey and ctypes in a head-to-head performance battle::

>>> import bitey
>>> from isprime import isprime as isprime1
>>> import ctypes
>>> ex = ctypes.cdll.LoadLibrary("./isprime.so")
>>> isprime2 = ex.isprime
>>> isprime2.argtypes=(ctypes.c_int,)
>>> isprime2.restype=ctypes.c_int
>>>
>>> from timeit import timeit
>>> # Bitey
>>> timeit("isprime1(3)","from __main__ import isprime1")
1.1813910007476807
>>> # ctypes
>>> timeit("isprime2(3)", "from __main__ import isprime2")
1.2408909797668457
>>>
>>> # Bitey
>>> timeit("isprime1(10143937)", "from __main__ import isprime1")
9.839216947555542
>>> # ctypes
>>> timeit("isprime2(10143937)", "from __main__ import isprime2")
9.663991212844849
>>>

As you can see, the performance is just about the same. The main
difference would come down to the efficiency of LLVM vs. gcc code
optimization.

Advanced Usage
==============
If you're up for a bit of adventure, the module creation process can
be altered through the use of pre and post loading files.

A pre-load file provides Python code that executes within the newly
created module prior to the LLVM-binding step. One use of this
code is to specify the names of fields on data structures. For
example, you can create the following pre-load file for the earlier
``Point`` example::

# point.pre.py

class Point:
_fields_ = ['x','y']

If you do this, you'll find that the field-names get fixed::

>>> import point
>>> p = point.Point(3,4)
>>> p.x
3.0
>>> p.y
4.0
>>>

You could also use a pre-load file to load library dependencies::

# sample.pre.py
import bitey
bitey.load_library("./libfoo.so")

A post-load file allows you alter the contents of the module
after LLVM-binding. You could use this to apply decorators
or add additional support code. For example::

# point.post.py
#
# Example of decorating a function already wrapped

def decorate(func):
def wrapper(*args, **kwargs):
print "Calling", func.__name__
return func(*args, **kwargs)
wrapper.__name__ = func.__name__
return wrapper

# Wrap the distance wrapper already created
distance = decorate(distance)

The combination of the pre/post loading files gives you almost
unlimited opportunity for insane evil when loading the bitcode.
It must be stressed that that these files are executed in
the space of the module being created---they are not separate
imports (i.e., the pre, post, and LLVM bindings all co-exist
in the same module namespace).

Automatic Binding
=================
In the examples, it is necessary to use ``import bitey`` for modules
to be recognized and loaded. If you want to skip this step and make
everything automatic, create a ``bitey.pth`` file that contains the
following statement::

# bitey.pth
import bitey

Now, copy this file to the Python ``site-packages`` directory.

FAQ
===
Q: Will Bitey ever support C++?

A: No. C++ can bite me (*)

(*) I also wrote Swig and still have C++ scars.

Q: Why is it called "Bitey?"

A: Well, "Bitey" is so much more catchy than simply calling it
something boring like "BIT (Bitcode Import Tool)". Plus, just like
@johnderosa's pet Pomeranian of the same name, you're never quite sure
whether "Bitey" is adorably cute or a viscious beast that will
constantly nip your leg. Actually, I just like the ring of
it--"Bitey" sort of rhymes with "Enterprisey".

Discussion Group
================
A discussion group for Bitey is available at http://groups.google.com/group/bitey

Authors
============
- David Beazley (@dabeaz), http://www.dabeaz.com