https://github.com/saltudelft/libsa4py
LibSA4Py: Light-weight static analysis for extracting type hints and features
https://github.com/saltudelft/libsa4py
ast-analysis features-extraction libsa4py light-weight machine-learning python static-analysis type-hints
Last synced: 5 months ago
JSON representation
LibSA4Py: Light-weight static analysis for extracting type hints and features
- Host: GitHub
- URL: https://github.com/saltudelft/libsa4py
- Owner: saltudelft
- License: apache-2.0
- Created: 2020-09-14T09:16:31.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-08-23T08:06:43.000Z (almost 3 years ago)
- Last Synced: 2025-11-27T18:30:23.847Z (7 months ago)
- Topics: ast-analysis, features-extraction, libsa4py, light-weight, machine-learning, python, static-analysis, type-hints
- Language: Python
- Homepage:
- Size: 454 KB
- Stars: 12
- Watchers: 4
- Forks: 6
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Intro
[](https://badge.fury.io/py/libsa4py)

[](https://codecov.io/gh/saltudelft/libsa4py)
`LibSA4Py` is a static analysis library for Python, which extracts type hints and features for training ML-based type inference models.
- [Requirements](#requirements)
- [Quick Installation](#quick-installation)
- [Usage](#usage)
- [Processing projects](#processing-projects)
- [Merging projects](#merging-projects)
- [JSON Output](#json-output)
# Requirements
- Python 3.7 or newer (Python 3.8 is recommended)
- [Watchman](https://facebook.github.io/watchman/) (for running [pyre](https://pyre-check.org/)) [**Optional**]
- MacOS or Linux systems
# Quick Installation
```
git clone https://github.com/saltudelft/libsa4py.git
cd libsa4py && pip install .
```
# Usage
## Processing projects
Given Python repositories, run the following command to process source code files and generate JSON-formatted outputs:
```
libsa4py process --p $REPOS_PATH --o $OUTPUT_PATH --d $DUPLICATE_PATH --j $WORKERS_COUNT --l $LIMIT --c --no-nlp --pyre
```
Description:
- `--p $REPOS_PATH`: The path to the Python corpus or dataset.
- `--o $OUTPUT_PATH`: Path to store processed projects.
- `--d $DUPLICATE_PATH`: Path to duplicate files of the given dataset (i.e. jsonl.gz file produced by the [CD4Py](https://github.com/saltudelft/CD4Py) tool). [**Optional**]
- `--s`: Path to the CSV file for splitting the given dataset. [**Optional**]
- `--j $WORKERS_COUNT`: Number of workers for processing projects. [**Optional**, default=no. of available CPU cores]
- `--l $LIMIT`: Number of projects to be processed. [**Optional**]
- `--c`: Whether to ignore processed projects. [**Optional**, default=False]
- `--no-nlp`: Whether to apply standard NLP techniques to extracted identifiers. [**Optional**, default=True]
- `--pyre`: Whether to run `pyre` to infer the types of variables for given projects. [**Optional**, default=False]
- `--tc`: Whether to type-check type annotations in projects. [**Optional**, default=False]
## Merging projects
To merge all the processed JSON-formatted projects into a single dataframe, run the following command:
```
libsa4py merge --o $OUTPUT_PATH --l $LIMIT
```
Description:
- `--o $OUTPUT_PATH`: Path to the processed projects, used in the previous processing step.
- `--l $LIMIT`: Number of projects to be merged. [**Optional**]
## Applying types
To apply Pyre's inferred types to projects, run the following command:
```
libsa4py apply --p $REPOS_PATH --o $OUTPUT_PATH
```
Description:
- `--p $REPOS_PATH`: The path to the Python corpus or dataset.
- `--o $OUTPUT_PATH`: Path to the processed projects, used in the previous processing step.
# JSON Output
After processing each project, a JSON-formatted file is produced, which is described [here](https://github.com/saltudelft/light-sa-type-inf/blob/master/JSONOutput.md).