https://github.com/labmlai/python_autocomplete
Use Transformers and LSTMs to learn Python source code
https://github.com/labmlai/python_autocomplete
autocomplete-python deep-learning deep-learning-tutorial nlp pytorch
Last synced: 13 days ago
JSON representation
Use Transformers and LSTMs to learn Python source code
- Host: GitHub
- URL: https://github.com/labmlai/python_autocomplete
- Owner: labmlai
- License: mit
- Created: 2020-08-08T08:50:00.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-01-18T11:14:57.000Z (almost 4 years ago)
- Last Synced: 2025-06-06T12:08:22.149Z (7 months ago)
- Topics: autocomplete-python, deep-learning, deep-learning-tutorial, nlp, pytorch
- Language: Jupyter Notebook
- Homepage:
- Size: 13.6 MB
- Stars: 191
- Watchers: 9
- Forks: 43
- Open Issues: 3
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://badge.fury.io/py/labml-python-autocomplete)
[](https://pepy.tech/project/labml-python-autocomplete)
[](https://join.slack.com/t/labforml/shared_invite/zt-egj9zvq9-Dl3hhZqobexgT7aVKnD14g/)
[](https://twitter.com/labmlai?ref_src=twsrc%5Etfw)
# Python Autocomplete
[The full length Python autocompletion Video](https://www.youtube.com/watch?v=ZFzxBPBUh0M) and a [Twitter thread describing how it works](https://twitter.com/labmlai/status/1367444214963838978)
This is a learning/demo project to show how deep learning can be used to auto complete Python code.
You can experiment with LSTM and Transformer models.
We also have built a simple VSCode extension to try out the trained models.
Training
model: [](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/train.ipynb)
Evaluating trained
model: [](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/evaluate.ipynb)
It gives quite decent results by saving above 30% key strokes in most files, and close to 50% in some. We calculated key
strokes saved by making a single (best)
prediction and selecting it with a single key.
The dataset we use is the python code found in repos linked in
[Awesome-pytorch-list](https://github.com/bharathgs/Awesome-pytorch-list). We download all the repositories as zip
files, extract them, remove non python files and split them randomly to build training and validation datasets.
We train a character level model without any tokenization of the source code, since it's the simplest.
### Try it yourself
1. Clone this repo
2. Install requirements from `requirements.txt`
3. Run `python_autocomplete/create_dataset.py`.
* It collects repos mentioned in
[PyTorch awesome list](https://github.com/bharathgs/Awesome-pytorch-list)
* Downloads the zip files of the repos
* Extract the zips
* Remove non python files
* Collect all python code to `data/train.py` and, `data/eval.py`
4. Run `python_autocomplete/train.py` to train the model.
*Try changing hyper-parameters like model dimensions and number of layers*.
5. Run `evaluate.py` to evaluate the model.
You can also run the training notebook on Google Colab.
[](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/train.ipynb)
### VSCode extension
1. Clone this repo
2. Install requirements from `requirements.txt`
3. Install npm packages
You need to have [Node.JS](https://nodejs.dev/) installed
```shell
cd vscode_extension
npm install # This will install the NPM packages
```
4. Start the server `python_autocomplete/serve.py`
5. Open the extension project (folder) in [VSCode](https://code.visualstudio.com/)
```shell
cd vscode_extension
code . # This will open vscode_extension in VSCode
```
If you don't have [VSCode command line launcher](https://code.visualstudio.com/docs/setup/mac#_launching-from-the-command-line)
start VSCode and open the project with `File > Open`
6. Run the extension from VSCode
```
Run > Start Debugging
```
This will open another VSCode editor window, with the extension
7. Create or open a python file and start editing!
### Sample
Here's a sample evaluation of a trained transformer model.
Colors:
* yellow: the token predicted is wrong and the user needs to type that character.
* blue: the token predicted is correct and the user selects it with a special key press,
such as TAB or ENTER.
* green: autocompleted characters based on the prediction