https://github.com/secure-software-engineering/typeevalpy

A Micro-benchmarking Framework for Python Type Inference Tools
https://github.com/secure-software-engineering/typeevalpy
benchmark python staticanalysis typeinference
Last synced: about 1 month ago
JSON representation
A Micro-benchmarking Framework for Python Type Inference Tools
Host: GitHub
URL: https://github.com/secure-software-engineering/typeevalpy
Owner: secure-software-engineering
Created: 2023-06-15T13:39:06.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-03-04T12:33:24.000Z (3 months ago)
Last Synced: 2025-03-25T05:41:37.570Z (about 2 months ago)
Topics: benchmark, python, staticanalysis, typeinference
Language: Python
Homepage:
Size: 29.3 MB
Stars: 33
Watchers: 5
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        







 A Micro-benchmarking Framework for Python Type Inference Tools 


## 📌 **Features**:

- 📜 Contains **154 code snippets** to test and benchmark.

- 🏷 Offers **845 type annotations** across a diverse set of Python functionalities.

- 📂 Organized into **18 distinct categories** targeting various Python features.

- 🚢 Seamlessly manages the execution of **containerized tools**.

- 🔄 Efficiently transforms inferred types into a **standardized format**.

- 📊 Automatically produces **meaningful metrics** for in-depth assessment and comparison.

### [New] TypeEvalPy Autogen

- 🤖 **Autogenerates code snippets** and ground truth to scale the benchmark based on the original `TypeEvalPy` benchmark.

- 📈 The autogen benchmark now contains:

  - **Python files**: 7121

  - **Type annotations**: 78373

## 🛠️ Supported Tools

| Supported :white_check_mark:                                          | In-progress :wrench:                                                 | Planned :bulb:                                        |

| --------------------------------------------------------------------- | -------------------------------------------------------------------- | ----------------------------------------------------- |

| [HeaderGen](https://github.com/secure-software-engineering/HeaderGen) | [Intellij PSI](https://plugins.jetbrains.com/docs/intellij/psi.html) | [MonkeyType](https://github.com/Instagram/MonkeyType) |

| [Jedi](https://github.com/davidhalter/jedi)                           | [Pyre](https://github.com/facebook/pyre-check)                       | [Pyannotate](https://github.com/dropbox/pyannotate)   |

| [Pyright](https://github.com/microsoft/pyright)                       | [PySonar2](https://github.com/yinwang0/pysonar2)                     |

| [HiTyper](https://github.com/JohnnyPeng18/HiTyper)                    | [Pytype](https://github.com/google/pytype)                           |

| [Scalpel](https://github.com/SMAT-Lab/Scalpel/issues)                 | [TypeT5](https://github.com/utopia-group/TypeT5)                     |

| [Type4Py](https://github.com/saltudelft/type4py)                      |                                                                      |

| [GPT](https://openai.com)                                             |                                                                      |

| [Ollama](https://ollama.ai)                                           |                                                                      |

---

## 🏆 TypeEvalPy Leaderboard

Below is a comparison showcasing exact matches across different tools and LLMs on the Autogen benchmark.

| Rank | 🛠️ Tool                                                                                        | Function Return Type | Function Parameter Type | Local Variable Type | Total |

| ---- | ---------------------------------------------------------------------------------------------- | -------------------- | ----------------------- | ------------------- | ----- |

| 1    | **[mistral-large-it-2407-123b](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)** | 16701                | 728                     | 57550               | 74979 |

| 2    | **[qwen2-it-72b](https://huggingface.co/Qwen/Qwen2-72B-Instruct)**                             | 16488                | 629                     | 55160               | 72277 |

| 3    | **[llama3.1-it-70b](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)**           | 16648                | 580                     | 54445               | 71673 |

| 4    | **[gemma2-it-27b](https://huggingface.co/google/gemma-2-27b-it)**                              | 16342                | 599                     | 49772               | 66713 |

| 5    | **[codestral-v0.1-22b](https://huggingface.co/mistralai/Codestral-22B-v0.1)**                  | 16456                | 706                     | 49379               | 66541 |

| 6    | **[codellama-it-34b](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf)**            | 15960                | 473                     | 48957               | 65390 |

| 7    | **[mistral-nemo-it-2407-12.2b](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)**  | 16221                | 526                     | 48439               | 65186 |

| 8    | **[mistral-v0.3-it-7b](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)**            | 16686                | 472                     | 47935               | 65093 |

| 9    | **[phi3-medium-it-14b](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)**          | 16802                | 467                     | 45121               | 62390 |

| 10   | **[llama3.1-it-8b](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)**             | 16125                | 492                     | 44313               | 60930 |

| 11   | **[codellama-it-13b](https://huggingface.co/meta-llama/CodeLlama-13b-Instruct-hf)**            | 16214                | 479                     | 43021               | 59714 |

| 12   | **[phi3-small-it-7.3b](https://huggingface.co/microsoft/Phi-3-small-128k-instruct)**           | 16155                | 422                     | 38093               | 54670 |

| 13   | **[qwen2-it-7b](https://huggingface.co/Qwen/Qwen2-7B-Instruct)**                               | 15684                | 313                     | 38109               | 54106 |

| 14   | **[HeaderGen](https://github.com/ashwinprasadme/headergen)**                                   | 14086                | 346                     | 36370               | 50802 |

| 15   | **[phi3-mini-it-3.8b](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)**             | 15908                | 320                     | 30341               | 46569 |

| 16   | **[phi3.5-mini-it-3.8b](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)**              | 15763                | 362                     | 28694               | 44819 |

| 17   | **[codellama-it-7b](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf)**              | 13779                | 318                     | 29346               | 43443 |

| 18   | **[Jedi](https://github.com/davidhalter/jedi)**                                                | 13160                | 0                       | 15403               | 28563 |

| 19   | **[Scalpel](https://github.com/SMAT-Lab/Scalpel/issues)**                                      | 15383                | 171                     | 18                  | 15572 |

| 20   | **[gemma2-it-9b](https://huggingface.co/google/gemma-2-9b-it)**                                | 1611                 | 66                      | 5464                | 7141  |

| 21   | **[Type4Py](https://github.com/saltudelft/type4py)**                                           | 3143                 | 38                      | 2243                | 5424  |

| 22   | **[tinyllama-1.1b](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)**                | 1514                 | 28                      | 2699                | 4241  |

| 23   | **[mixtral-v0.1-it-8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)**        | 3235                 | 33                      | 377                 | 3645  |

| 24   | **[phi3.5-moe-it-41.9b](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)**               | 3090                 | 25                      | 273                 | 3388  |

| 25   | **[gemma2-it-2b](https://huggingface.co/google/gemma-2-2b-it)**                                | 1497                 | 41                      | 1848                | 3386  |

__{(Auto-generated based on the the analysis run on 30 Aug 2024)}_

---

## :whale: Running with Docker

### 1️⃣ Clone the repo

```bash

git clone https://github.com/secure-software-engineering/TypeEvalPy.git

```

### 2️⃣ Build Docker image

```bash

docker build -t typeevalpy .

```

### 3️⃣ Run TypeEvalPy

🕒 Takes about 30mins on first run to build Docker containers.

📂 Results will be generated in the `results` folder within the root directory of the repository.

Each results folder will have a timestamp, allowing you to easily track and compare different runs.

  Correlation of CSV Files Generated to Tables in ICSE Paper

Here is how the auto-generated CSV tables relate to the paper's tables:

- **Table 1** in the paper is derived from three auto-generated CSV tables:

  - `paper_table_1.csv` - details Exact matches by type category.

  - `paper_table_2.csv` - lists Exact matches for 18 micro-benchmark categories.

  - `paper_table_3.csv` - provides Sound and Complete values for tools.

- **Table 2** in the paper is based on the following CSV table:

  - `paper_table_5.csv` - shows Exact matches with top_n values for machine learning tools.

Additionally, there are CSV tables that are _not_ included in the paper:

- `paper_table_4.csv` - containing Sound and Complete values for 18 micro-benchmark categories.

- `paper_table_6.csv` - featuring Sensitivity analysis.

```bash

docker run \

      -v /var/run/docker.sock:/var/run/docker.sock \

      -v ./results:/app/results \

      typeevalpy

```

🔧 **Optionally**, run analysis on specific tools:

```bash

docker run \

      -v /var/run/docker.sock:/var/run/docker.sock \

      -v ./results:/app/results \

      typeevalpy --runners headergen scalpel

```

📊 Run analysis on custom benchmarks:

Here, running with the autogen benchmark on HeaderGen

```bash

docker run \

      -v /var/run/docker.sock:/var/run/docker.sock \

      -v ./results:/app/results \

      typeevalpy \

      --runners headergen \

      --custom_benchmark_dir /app/autogen_typeevalpy_benchmark

```

🛠️ Available options: `headergen`, `pyright`, `scalpel`, `jedi`, `hityper`, `type4py`, `hityperdl`

### 🤖 Running TypeEvalPy with LLMs

TypeEvalPy integrates with LLMs through Ollama, streamlining their management. Begin by setting up your environment:

- Create Configuration File: Copy the `config_template.yaml` from the src directory and rename it to `config.yaml`.

In the `config.yaml`, configure in the following:

- `openai_key`: your key for accessing OpenAI's models.

- `ollama_url`: the URL for your Ollama instance. For simplicity, we recommend deploying Ollama using their Docker container. [Get started with Ollama here](https://hub.docker.com/r/ollama/ollama).

- `prompt_id`: set this to `questions_based_2` for optimal performance, based on our tests.

- `ollama_models`: select a list of model tags from the [Ollama library](https://ollama.com/library). For better operation, ensure the model is pre-downloaded with the `ollama pull` command.

With the `config.yaml` configured, run the following command:

```bash

docker run \

      -v /var/run/docker.sock:/var/run/docker.sock \

      -v ./results:/app/results \

      typeevalpy --runners ollama

```

---

  Running From Source...

## 1. 📥 Installation

1.  **Clone the repo**

    ```bash

    git clone https://github.com/secure-software-engineering/TypeEvalPy.git

    ```

2.  **Install Dependencies and Set Up Virtual Environment**

    Run the following commands to set up your virtual environment and activate the virtual environment.

    ```bash

    python3 -m venv .env

    ```

    ```bash

    source .env/bin/activate

    ```

    ```bash

    pip install -r requirements.txt

    ```

---

## 2. 🚀 Usage: Running the Analysis

1.  **Navigate to the `src` Directory**

    ```bash

    cd src

    ```

2.  **Execute the Analyzer**

    Run the following command to start the benchmarking process on all tools:

    ```bash

    python main_runner.py

    ```

    or

    Run analysis on specific tools

    ```

    python main_runner.py --runners headergen scalpel

    ```

---

## Running TypeEvalPy Autogen

To generate an extended version of the original TypeEvalPy benchmark to include many more Python types, run the following commands:

1.  **Navigate to the `autogen` Directory**

    ```bash

    cd autogen

    ```

2.  **Execute the Generation Script**

    Run the following command to start the generation process:

    ```bash

    python generate_typeevalpy_dataset.py

    ```

This will generate a folder in the repo root with the autogen benchmark with the current date.

---

### 🤝 Contributing

Thank you for your interest in contributing! To add support for a new tool, please utilize the Docker templates provided in our repository. After implementing and testing your tool, please submit a pull request (PR) with a descriptive message. Our maintainers will review your submission, and merge them.

To get started with integrating your tool, please follow the guide here: [docs/Tool_Integration_Guide.md](docs/Tool_Integration_Guide.md)

---

### ⭐️ Show Your Support

Give a ⭐️ if this project helped you!
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/secure-software-engineering/typeevalpy

Awesome Lists containing this project

README

A Micro-benchmarking Framework for Python Type Inference Tools