{"id":13800931,"url":"https://github.com/saltudelft/type4py","last_synced_at":"2025-05-13T10:30:40.298Z","repository":{"id":39973187,"uuid":"309953458","full_name":"saltudelft/type4py","owner":"saltudelft","description":"Type4Py: Deep Similarity Learning-Based Type Inference for Python","archived":false,"fork":false,"pushed_at":"2023-09-06T12:40:40.000Z","size":206,"stargazers_count":61,"open_issues_count":6,"forks_count":12,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-11-18T15:55:51.182Z","etag":null,"topics":["deeplearning","machinelearning","ml4se","python","similarity-learning","type4py","typeinference"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saltudelft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-11-04T09:20:30.000Z","updated_at":"2024-08-12T05:22:21.000Z","dependencies_parsed_at":"2024-01-29T19:30:20.555Z","dependency_job_id":"da2dc3a0-c624-4084-a1e1-7da265af1fe4","html_url":"https://github.com/saltudelft/type4py","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltudelft%2Ftype4py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltudelft%2Ftype4py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltudelft%2Ftype4py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltudelft%2Ftype4py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saltudelft","download_url":"https://codeload.github.com/saltudelft/type4py/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253922790,"owners_count":21984780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deeplearning","machinelearning","ml4se","python","similarity-learning","type4py","typeinference"],"created_at":"2024-08-04T00:01:17.758Z","updated_at":"2025-05-13T10:30:39.991Z","avatar_url":"https://github.com/saltudelft.png","language":"Python","readme":"# Type4Py: Deep Similarity Learning-Based Type Inference for Python\n![GH Workflow](https://github.com/saltudelft/type4py/actions/workflows/.github/workflows/type4py_server_test.yaml/badge.svg)\n![GH Workflow](https://github.com/saltudelft/type4py/actions/workflows/.github/workflows/publish_type4py_docker_img.yaml/badge.svg)\n\nThis repository contains the implementation of Type4Py and instructions for re-producing the results of the paper.\n\n- [Dataset](#dataset)\n- [Installation Guide](#installation-guide)\n- [Usage Guide](#usage-guide)\n- [Converting Type4Py to ONNX](#converting-type4py-to-onnx)\n- [VSCode Extension](#vscode-extension)\n- [Using Local Pre-trained Model](#using-local-pre-trained-model)\n- [Type4Py Server](#type4py-server)\n- [Citing Type4Py](#citing-type4py)\n\n# Dataset\nFor Type4Py, we use the **ManyTypes4Py** dataset. You can download the latest version of the dataset [here](https://doi.org/10.5281/zenodo.4044635).\nAlso, note that the dataset is already de-duplicated.\n\n## Code De-deduplication\nIf you want to use your own dataset, \nit is essential to de-duplicate the dataset by using a tool like [CD4Py](https://github.com/saltudelft/CD4Py).\n\n# Installation Guide\n## Requirements\nHere are the recommended system requirements for training Type4Py on the MT4Py dataset:\n- Linux-based OS (Ubuntu 18.04 or newer)\n- Python 3.6 or newer\n- A high-end NVIDIA GPU (w/ at least 8GB of VRAM)\n- A CPU with 16 threads or higher (w/ at least 64GB of RAM)\n\n## Quick Install\n```\ngit clone https://github.com/saltudelft/type4py.git \u0026\u0026 cd type4py\npip install .\n```\n\n# Usage Guide\nFollow the below steps to train and evaluate the Type4Py model.\n## 1. Extraction\n**NOTE:** Skip this step if you're using the ManyTypes4Py dataset.\n```\n$ type4py extract --c $DATA_PATH --o $OUTPUT_DIR --d $DUP_FILES --w $CORES\n```\nDescription:\n- `$DATA_PATH`: The path to the Python corpus or dataset.\n- `$OUTPUT_DIR`: The path to store processed projects.\n- `$DUP_FILES`: The path to the duplicate files, i.e., the `*.jsonl.gz` file produced by CD4Py. [Optional]\n- `$CORES`: Number of CPU cores to use for processing projects.\n\n## 2. Preprocessing\n```\n$ type4py preprocess --o $OUTPUT_DIR --l $LIMIT\n```\nDescription:\n- `$OUTPUT_DIR`: The path that was used in the first step to store processed projects. For the MT4Py dataset, use the directory in which the dataset is extracted.\n- `$LIMIT`: The number of projects to be processed. [Optional]\n\n## 3. Vectorizing\n```\n$ type4py vectorize --o $OUTPUT_DIR\n```\nDescription:\n- `$OUTPUT_DIR`: The path that was used in the previous step to store processed projects.\n\n## 4. Learning\n```\n$ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE\n```\nDescription:\n- `$OUTPUT_DIR`: The path that was used in the previous step to store processed projects.\n- `--c`: Trains the complete model. Use `type4py learn -h` to see other configurations.\n\n- `--p $PARAM_FILE`: The path to user-provided hyper-parameters for the model. See [this](https://github.com/saltudelft/type4py/blob/main/type4py/model_params.json) file as an example. [Optional]\n\n## 5. Testing\n```\n$ type4py predict --o $OUTPUT_DIR --c\n```\n\nDescription:\n- `$OUTPUT_DIR`: The path that was used in the first step to store processed projects.\n- `--c`: Predicts using the complete model. Use `type4py predict -h` to see other configurations.\n\n## 6. Evaluating\n```\n$ type4py eval --o $OUTPUT_DIR --t c --tp 10\n```\n\nDescription:\n- `$OUTPUT_DIR`: The path that was used in the first step to store processed projects.\n- `--t`: Evaluates the model considering different prediction tasks. E.g., `--t c` considers all predictions tasks,\n  i.e., parameters, return, and variables. [Default: c]\n- `--tp 10`: Considers Top-10 predictions for evaluation. For this argument, You can choose a positive integer between 1 and 10. [Default: 10]\n\nUse `type4py eval -h` to see other options.\n\n## Reduce\nTo reduce the dimension of the created type clusters in step 5, run the following command:\n\u003e Note: The reduced version of type clusters causes a slight performance loss in type prediction.\n```\n$ type4py reduce --o $OUTPUT_DIR --d $DIMENSION\n```\n\nDescription:\n- `$OUTPUT_DIR`: The path that was used in the first step to store processed projects.\n- `$DIMENSION`: Reduces the dimension of type clusters to the specified value [Default: 256]\n\n# Converting Type4Py to ONNX\nTo convert the pre-trained Type4Py model to the [ONNX](https://onnxruntime.ai/) format, use the following command:\n```\n$ type4py to_onnx --o $OUTPUT_DIR\n```\nDescription:\n- `$OUTPUT_DIR`: The path that was used in the [usage](#usage-guide) section to store processed projects and the model.\n\n# VSCode Extension\n[![vsm-version](https://img.shields.io/visual-studio-marketplace/v/saltud.type4py?style=flat\u0026label=VS%20Marketplace\u0026logo=visual-studio-code)](https://marketplace.visualstudio.com/items?itemName=saltud.type4py)\n\nType4Py can be used in VSCode, which provides ML-based type auto-completion for Python files. The Type4Py's VSCode extension can be installed from the VS Marketplace [here](https://marketplace.visualstudio.com/items?itemName=saltud.type4py).\n\n# Using Local Pre-trained Model\nType4Py's pre-trained model can be queried locally by using provided Docker images. See [here](https://github.com/saltudelft/type4py/wiki/Type4Py's-Local-Model) for usage info.\n\n# Type4Py Server\n![GH Workflow](https://github.com/saltudelft/type4py/actions/workflows/.github/workflows/type4py_server_test.yaml/badge.svg)\n\nThe Type4Py server is deployed on our server, which exposes a public API and powers the VSCode extension.\nHowever, if you would like to deploy the Type4Py server on your own machine, you can adapt the server code [here](https://github.com/saltudelft/type4py/tree/server/type4py/server).\nAlso, please feel free to reach out to us for deployment, using the pre-trained Type4Py model and how to train your own model by creating an [issue](https://github.com/saltudelft/type4py/issues).\n\n# Citing Type4Py\n\n```\n@inproceedings{mir2022type4py,\n  title={Type4Py: practical deep similarity learning-based type inference for python},\n  author={Mir, Amir M and Lato{\\v{s}}kinas, Evaldas and Proksch, Sebastian and Gousios, Georgios},\n  booktitle={Proceedings of the 44th International Conference on Software Engineering},\n  pages={2241--2252},\n  year={2022}\n}\n```","funding_links":[],"categories":["Python","Tools"],"sub_categories":["Helper tools to add annotations to existing code"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaltudelft%2Ftype4py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaltudelft%2Ftype4py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaltudelft%2Ftype4py/lists"}