https://github.com/zanellia/prometeo

An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
https://github.com/zanellia/prometeo
c compiler domain-specific-language embedded-systems high-performance-computing hpc python python-to-c source-to-source static-analysis static-typing transcompiler transpiler
Last synced: about 2 months ago
JSON representation
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
Host: GitHub
URL: https://github.com/zanellia/prometeo
Owner: zanellia
License: bsd-2-clause
Created: 2017-12-26T22:45:53.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-05-19T16:32:39.000Z (about 3 years ago)
Last Synced: 2025-05-10T15:54:02.392Z (about 2 months ago)
Topics: c, compiler, domain-specific-language, embedded-systems, high-performance-computing, hpc, python, python-to-c, source-to-source, static-analysis, static-typing, transcompiler, transpiler
Language: Python
Homepage:
Size: 1.93 MB
Stars: 633
Watchers: 15
Forks: 33
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        

![Travis Status](https://travis-ci.org/zanellia/prometeo.svg?branch=master) [![PyPI version fury.io](https://badge.fury.io/py/prometeo-dsl.svg)](https://pypi.python.org/pypi/ansicolortags/) [![Documentation Status](https://readthedocs.org/projects/prometeo/badge/?version=latest)](https://prometeo.readthedocs.io/en/latest/?badge=latest)

This is prometeo, an experimental modeling tool for embedded high-performance computing. prometeo provides a 

domain specific language (DSL) based on a subset of the Python language that allows 

one to conveniently write scientific computing programs in a high-level language (Python itself) that can be transpiled

to high-performance self-contained C code easily deployable on embedded devices.

### features

- __Python compatible syntax__ : prometeo is a DSL embedded into the Python language. 

prometeo programs can be executed from the Python interpreter.

- __efficient__ : prometeo programs transpile to high-performance C code.

- __statically typed__ : prometeo uses Python's native type hints to strictly enforce static typing.

- __deterministic memory usage__ : a specific program structure is required and enforced through static analysis. In this way

prometeo transpiled programs have a guaranteed maximum heap usage.

- __fast memory management__ : thanks to its static analysis, prometeo can avoid allocating

and garbage-collecting memory, resulting in faster and safer execution.

- __self-contained and embeddable__ : unlike other similar tools and languages, prometeo targets specifically embedded applications and programs 

written in prometeo transpile to self-contained C code that does not require linking against 

the Python run-time library.

### documentation (preliminary)

prometeo's documentation can be found on Read the Docs at [https://prometeo.readthedocs.io/en/latest/index.html](https://prometeo.readthedocs.io/en/latest/index.html).

      

### hello world!

A simple hello world example that shows how to either run a trivial prometeo program from Python or transpile it to C, build it and run it can be found [here](https://github.com/zanellia/prometeo/blob/master/examples/helloworld/helloworld.py). The output shows the outcome of the heap usage analysis and the execution time (in this case there is not much to see :p).







### performance

Since prometeo programs transpile to pure C code that calls the high performance linear algebra library BLASFEO (publication: [https://arxiv.org/abs/1704.02457](https://arxiv.org/abs/1704.02457), code: [https://github.com/giaf/blasfeo](https://github.com/giaf/blasfeo)), execution time can be comparable to hand-written high-performance code. The figure below shows a comparison of the CPU time necessary to carry out a Riccati factorization using highly optimized hand-written C code with calls to BLASFEO and the ones obtained with prometeo transpiled code from [this example](https://github.com/zanellia/prometeo/blob/master/examples/riccati_example/riccati_mass_spring.py). The computation times obtained with NumPy and Julia are added too for comparison - notice however that these last two implementations of the Riccati factorization are **not as easily embeddable** as the C code generated by prometeo and the hand-coded C implementation. All the benchmarks have been run on a Dell XPS-9360 equipped with an i7-7560U CPU running at 2.30 GHz (to avoid frequency fluctuations due to thermal throttling). 







Moreover, prometeo can largely outperform state-of-the-art Python compilers such as Nuitka. The table below shows the CPU times obtained on a Fibonacci benchmark.

|    parser/compiler   | CPU time [s] |

|:--------------------:|--------------|

| Python 3.7 (CPython) |    11.787    |

|        Nuitka        |    10.039    |

|	PyPy           |     1.78     |

|       prometeo       |     0.657    |

### PyPI installation

prometeo can be installed through PyPI with `pip install prometeo-dsl`. Notice that, since prometeo makes extensive use of [type hints](https://docs.python.org/3.6/library/typing.html) to equip Python code with static typing information, the minimum Python version required is 3.6.

### manual installation

If you want to install prometeo building the sources on your local machine you can proceed as follows:

- Run `git submodule update --init` to clone the submodules.

- Run `make install_shared` from `/prometeo/cpmt` to compile and install the shared library associated with the C backend. Notice that the default installation path is `/prometeo/cpmt/install`.

- You need Python 3.6. or later.

- Optional: to keep things clean you can setup a virtual environment with `virtualenv --python= `.

- Run `pip install -e .` from `` to install the Python package.

Finally, you can run the examples in `/examples` with `pmt .py --cgen=`, where the `--cgen` flag determines whether the code is executed by the Python interpreter or C code is generated compiled and run.

### a simple example

The Python code ([`examples/simple_example/simple_example.py`](https://github.com/zanellia/prometeo/blob/master/examples/simple_example/simple_example.py))

```python

from prometeo import *

n : dims = 10

def main() -> int:

    A: pmat = pmat(n, n)

    for i in range(10):

        for j in range(10):

            A[i, j] = 1.0

    B: pmat = pmat(n, n)

    for i in range(10):

        B[0, i] = 2.0

    C: pmat = pmat(n, n)

    C = A * B

    pmat_print(C)

    return 0

```

can be run by the standard Python interpreter (version >3.6 required) and it 

will perform the described linear algebra operations using the command `pmt simple_example.py --cgen=False`. 

At the same time, the code can be parsed by prometeo and its abstract syntax tree (AST) analyzed in order

to generate the following high-performance C code:

```c

#include "stdlib.h"

#include "simple_example.h"

void * ___c_pmt_8_heap;

void * ___c_pmt_64_heap;

void * ___c_pmt_8_heap_head;

void * ___c_pmt_64_heap_head;

#include "prometeo.h"

int main() {

    ___c_pmt_8_heap = malloc(10000); 

    ___c_pmt_8_heap_head = ___c_pmt_8_heap;

    char * pmem_ptr = (char *)___c_pmt_8_heap;

    align_char_to(8, &pmem_ptr);

    ___c_pmt_8_heap = pmem_ptr;

    ___c_pmt_64_heap = malloc(1000000);

    ___c_pmt_64_heap_head = ___c_pmt_64_heap;

    pmem_ptr = (char *)___c_pmt_64_heap;

    align_char_to(64, &pmem_ptr);

    ___c_pmt_64_heap = pmem_ptr;

	void *callee_pmt_8_heap = ___c_pmt_8_heap;

	void *callee_pmt_64_heap = ___c_pmt_64_heap;

    struct pmat * A = c_pmt_create_pmat(n, n);

    for(int i = 0; i < 10; i++) {

        for(int j = 0; j < 10; j++) {

            c_pmt_pmat_set_el(A, i, j, 1.0);

    }

    }

    struct pmat * B = c_pmt_create_pmat(n, n);

    for(int i = 0; i < 10; i++) {

        c_pmt_pmat_set_el(B, 0, i, 2.0);

    }

    struct pmat * C = c_pmt_create_pmat(n, n);

    c_pmt_pmat_fill(C, 0.0);

    c_pmt_gemm_nn(A, B, C, C);

    c_pmt_pmat_print(C);

	___c_pmt_8_heap = callee_pmt_8_heap;

	___c_pmt_64_heap = callee_pmt_64_heap;

	free(___c_pmt_8_heap_head);

	free(___c_pmt_64_heap_head);

	return 0;

}

```

which relies on the high-performance linear algebra package BLASFEO. The generated code will be readily compiled and run with when running `pmt simple_example.py --cgen=True`.

### how does it work?

Although translating a program written in a language into another with

a comparable level of abstraction can be significantly easier than

translating to one with a very different level of abstraction

(especially if the target language is of much lower level),

translating Python programs into C programs still involves a

considerable abstraction gap it is not an easy task

in general. Loosely speaking, the challenge lies in the necessity to

reimplement features that are natively supported by the source

language in the target language. In particular, when translating

Python to C, the difficulty comes both from the different level of

abstraction of the two languages and from the fact that the source and

target language are of two very different types: Python is an

*interpreted*, *duck-typed* and

*garbage-collected* language and C is a

*compiled* and *statically typed* language.

The task of transpiling Python to C becomes even more challenging if we add the

constraint that the generated C code must be efficient (even for small

to medium scale computations) and deployable

on embedded hardware. In fact these two requirements directly imply

that the generated code cannot make use of: *i)* sophisticated runtime

libraries, e.g., the Python runtime library, which are generally not available on

embedded hardware *ii)* dynamic memory allocation that would make the execution

slow and unreliable (exception made for memory that is allocated in a

setup phase and whose size is known a priori). 

Since source-to-source code transformation, or transpilation, and in

particular transpilation of Python code into C code is not an

unexplored realm, in the following, we mention a few existing projects

that address it. In doing so, we highlight where and how they do not satisfy one of

the two requirements outlined above, namely (small scale) efficiency

and embeddability. 







     

### related work

Several software packages exist that address Python-to-C translation in

various forms. 

In the context of high-performance computing,

*Numba* is a just-in-time compiler for numerical

functions written in

Python. As such, its aim is to convert properly annotated Python

functions, not entire programs, into high-performance LLVM code such 

that their execution can be sped

up. *Numba* uses an internal representation of the code to be

translated and performs a (potentially partial) type inference on the

variables involved in order to generate LLVM code that can be called

either from Python or from C/C++. In some cases, namely the

ones where a complete type inference can

be carried out successfully, code that does not rely on the C API can be

generated (using the *nopython* flag). However, the emitted LLVM code

would still rely on *Numpy* for BLAS and LAPACK  operations.  

*Nuitka* is a source-to-source compiler that can translate

every Python construct into C code that links against the*libpython* library and it is therefore able to transpile a

large class of Python programs. In order to do so, it relies on the fact

that one of the most used implementations of the Python language,

namely *CPython*, is written in C. In fact, *Nuitka*

generates C code that contains calls to *CPython* that would

normally be carried out by the Python parser. Despite its

attractive and general transpilation approach, it cannot be easily

deployed on embedded hardware due to its intrinsic dependency on

*libpython*. At the same time, since it maps rather closely

Python constructs to their *CPython* implementation, a number

of performance issues can be expected when it comes to small to medium

scale high-performance computing. This is particularly due to the fact

that operations associated with,

for example, type checking, memory allocation and garbage collection

that can slow down the execution are carried out by the transpiled

program too.

*Cython* is a programming language whose goal is to

facilitate writing C extensions for the Python language. In

particular, it can translate (optionally) statically typed Python-like

code into C code that relies on *CPython*. Similarly to the

considerations made for *Nuitka*, this makes it a powerful tool

whenever it is possible to rely on *libpython* (and when its

overhead is negligible, i.e., when dealing with sufficiently large

scale computations), but not in the context of interest here.

Finally, although it does not use Python as source language, we should

mention that *Julia* too is just-in-time (and partially

ahead-of-time) compiled into LLVM

code. The emitted LLVM code relies however on the *Julia*

runtime library such that considerations similar to the one made for

*Cython* and *Nuitka* apply.

### *prometeo*'s transpiler

Transpilation of programs written using a restricted subset of the

Python language into C programs is carried out using *prometeo*'s

transpiler. This source-to-source transformation tool analyzes abstract

syntax trees (AST) associated with the source files to be transpiled

in order to emit high-performance and embeddable C code. In order to

do so, special rules need to be imposed on the Python code. This makes

the otherwise extremely challenging task of transpiling an interpreted high-level

duck-typed language into a compiled low-level statically typed one

possible. In doing so, we define what is sometimes referred to as an *embedded* 

DSL in the sense the resulting language uses the syntax of a host language

(Python itself) and, in *prometeo*'s case, it can also be executed by

the standard Python interpreter. 







### a more advanced example (Riccati factorization)

```python

from prometeo import *

nx:  dims = 2

nu:  dims = 2

nxu: dims = nx + nu

N:   dims = 5

def main() -> int:

    # number of repetitions for timing

    nrep : int = 10000

    A: pmat = pmat(nx, nx)

    A[0,0] = 0.8

    A[0,1] = 0.1

    A[1,0] = 0.3

    A[1,1] = 0.8

    B: pmat = pmat(nx, nu)

    B[0,0] = 1.0  

    B[1,1] = 1.0

    Q: pmat = pmat(nx, nx)

    Q[0,0] = 1.0  

    Q[1,1] = 1.0

    R: pmat = pmat(nu, nu)

    R[0,0] = 1.0  

    R[1,1] = 1.0

    A: pmat = pmat(nx, nx)

    B: pmat = pmat(nx, nu)

    Q: pmat = pmat(nx, nx)

    R: pmat = pmat(nu, nu)

    RSQ: pmat = pmat(nxu, nxu)

    Lxx: pmat = pmat(nx, nx)

    M: pmat = pmat(nxu, nxu)

    w_nxu_nx: pmat = pmat(nxu, nx)

    BAt : pmat = pmat(nxu, nx)

    BA : pmat = pmat(nx, nxu)

    pmat_hcat(B, A, BA)

    pmat_tran(BA, BAt)

    RSQ[0:nu,0:nu] = R

    RSQ[nu:nu+nx,nu:nu+nx] = Q

    # array-type Riccati factorization

    for i in range(nrep):

        pmt_potrf(Q, Lxx)

        M[nu:nu+nx,nu:nu+nx] = Lxx

        for i in range(1, N):

            pmt_trmm_rlnn(Lxx, BAt, w_nxu_nx)

            pmt_syrk_ln(w_nxu_nx, w_nxu_nx, RSQ, M)

            pmt_potrf(M, M)

            Lxx[0:nx,0:nx] = M[nu:nu+nx,nu:nu+nx]

    return 0

```

Similarly, the code above ([`example/riccati/riccati_array.py`](https://github.com/zanellia/prometeo/blob/master/examples/riccati_example/riccati_array.py)) can be run by the standard Python interpreter using the command `pmt riccati_array.py --cgen=False` and prometeo can generate, compile and run C code using instead `pmt riccati_array.py --cgen=True`.

### supported Python constructs

In order to be able to transpile to C, only a subset of the Python language is supported. However, non C-like features such as function overload and classes are supported by prometeo's transpiler. The adapted Riccati example ([`examples/riccati/riccati_mass_spring_2.py`](https://github.com/zanellia/prometeo/blob/master/examples/riccati_example/riccati_mass_spring_2.py)) below shows how classes can be created and used.

```python

from prometeo import *

nm: dims = 4

nx: dims  = 2*nm

sizes: dimv = [[8,8], [8,8], [8,8], [8,8], [8,8]]

nu: dims  = nm

nxu: dims = nx + nu

N:  dims  = 5

class qp_data:

    A: List = plist(pmat, sizes)

    B: List = plist(pmat, sizes)

    Q: List = plist(pmat, sizes)

    R: List = plist(pmat, sizes)

    P: List = plist(pmat, sizes)

    fact: List = plist(pmat, sizes)

    def factorize(self) -> None:

        M: pmat = pmat(nxu, nxu)

        Mxx: pmat = pmat(nx, nx)

        L: pmat = pmat(nxu, nxu)

        Q: pmat = pmat(nx, nx)

        R: pmat = pmat(nu, nu)

        BA: pmat = pmat(nx, nxu)

        BAtP: pmat = pmat(nxu, nx)

        pmat_copy(self.Q[N-1], self.P[N-1])

        pmat_hcat(self.B[N-1], self.A[N-1], BA)

        pmat_copy(self.Q[N-1], Q)

        pmat_copy(self.R[N-1], R)

        for i in range(1, N):

            pmat_fill(BAtP, 0.0)

            pmt_gemm_tn(BA, self.P[N-i], BAtP, BAtP)

            pmat_fill(M, 0.0)

            M[0:nu,0:nu] = R

            M[nu:nu+nx,nu:nu+nx] = Q

            pmt_gemm_nn(BAtP, BA, M, M)

            pmat_fill(L, 0.0)

            pmt_potrf(M, L)

            Mxx[0:nx, 0:nx] = L[nu:nu+nx, nu:nu+nx]

            # pmat_fill(self.P[N-i-1], 0.0)

            pmt_gemm_nt(Mxx, Mxx, self.P[N-i-1], self.P[N-i-1])

            # pmat_print(self.P[N-i-1])

        return

def main() -> int:

    A: pmat = pmat(nx, nx)

    Ac11 : pmat = pmat(nm,nm)

    Ac12 : pmat = pmat(nm,nm)

    for i in range(nm):

        Ac12[i,i] = 1.0

    Ac21 : pmat = pmat(nm,nm)

    for i in range(nm):

        Ac21[i,i] = -2.0

    for i in range(nm-1):

        Ac21[i+1,i] = 1.0

        Ac21[i,i+1] = 1.0

    Ac22 : pmat = pmat(nm,nm)

    for i in range(nm):

        for j in range(nm):

            A[i,j] = Ac11[i,j]

    for i in range(nm):

        for j in range(nm):

            A[i,nm+j] = Ac12[i,j]

    for i in range(nm):

        for j in range(nm):

            A[nm+i,j] = Ac21[i,j]

    for i in range(nm):

        for j in range(nm):

            A[nm+i,nm+j] = Ac22[i,j]

    tmp : float = 0.0

    for i in range(nx):

        tmp = A[i,i]

        tmp = tmp + 1.0

        A[i,i] = tmp

    B: pmat = pmat(nx, nu)

    for i in range(nu):

        B[nm+i,i] = 1.0

    Q: pmat = pmat(nx, nx)

    for i in range(nx):

        Q[i,i] = 1.0

    R: pmat = pmat(nu, nu)

    for i in range(nu):

        R[i,i] = 1.0

    qp : qp_data = qp_data() 

    for i in range(N):

        qp.A[i] = A

    for i in range(N):

        qp.B[i] = B

    for i in range(N):

        qp.Q[i] = Q

    for i in range(N):

        qp.R[i] = R

    qp.factorize()

    

    return 0

```

__Disclaimer: prometeo is still at a very preliminary stage and only a few linear algebra operations and Python constructs are supported for the time being.__
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zanellia/prometeo

Awesome Lists containing this project

README