https://github.com/jetbrains-research/code2seq

PyTorch's implementation of the code2seq model.
https://github.com/jetbrains-research/code2seq

code2seq ml4code ml4se pytorch pytorch-lightning

Last synced: 5 months ago
JSON representation

PyTorch's implementation of the code2seq model.

Host: GitHub
URL: https://github.com/jetbrains-research/code2seq
Owner: JetBrains-Research
License: mit
Created: 2020-06-23T08:43:54.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-07-25T10:18:22.000Z (over 1 year ago)
Last Synced: 2025-09-25T11:19:02.327Z (6 months ago)
Topics: code2seq, ml4code, ml4se, pytorch, pytorch-lightning
Language: Python
Homepage:
Size: 6.14 MB
Stars: 62
Watchers: 5
Forks: 18
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # code2seq

[![JetBrains Research](https://jb.gg/badges/research.svg)](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)

[![Github action: build](https://github.com/SpirinEgor/code2seq/workflows/Build/badge.svg)](https://github.com/SpirinEgor/code2seq/actions?query=workflow%3ABuild)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

PyTorch's implementation of code2seq model.

## Installation

You can easily install model through the PIP:

```shell

pip install code2seq

```

## Dataset mining

To prepare your own dataset with a storage format supported by this implementation, use on the following:

1. Original dataset preprocessing from vanilla repository

2. [`astminer`](https://github.com/JetBrains-Research/astminer):

the tool for mining path-based representation and more with multiple language support.

3. [`PSIMiner`](https://github.com/JetBrains-Research/psiminer):

the tool for extracting PSI trees from IntelliJ Platform and creating datasets from them.

## Available checkpoints

### Method name prediction

| Dataset (with link)                                                                                                     | Checkpoint                                                                                                        | # epochs | F1-score | Precision | Recall | ChrF  |

|-------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|----------|----------|-----------|--------|-------|

| [Java-small](https://s3.eu-west-1.amazonaws.com/datasets.ml.labs.aws.intellij.net/java-paths-methods/java-small.tar.gz) | [link](https://s3.eu-west-1.amazonaws.com/datasets.ml.labs.aws.intellij.net/checkpoints/code2seq_java_small.ckpt) | 11       | 41.49    | 54.26     | 33.59  | 30.21 |

| [Java-med](https://s3.eu-west-1.amazonaws.com/datasets.ml.labs.aws.intellij.net/java-paths-methods/java-med.tar.gz)     | [link](https://s3.eu-west-1.amazonaws.com/datasets.ml.labs.aws.intellij.net/checkpoints/code2seq_java_med.ckpt)   | 10       | 48.17    | 58.87     | 40.76  | 42.32 |

## Configuration

The model is fully configurable by standalone YAML file.

Navigate to [config](config) directory to see examples of configs.

## Examples

Model training may be done via PyTorch Lightning trainer.

See it [documentation](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html) for more information.

```python

from argparse import ArgumentParser

from omegaconf import DictConfig, OmegaConf

from pytorch_lightning import Trainer

from code2seq.data.path_context_data_module import PathContextDataModule

from code2seq.model import Code2Seq

def train(config: DictConfig):

    # Define data module

    data_module = PathContextDataModule(config.data_folder, config.data)

    # Define model

    model = Code2Seq(

        config.model,

        config.optimizer,

        data_module.vocabulary,

        config.train.teacher_forcing

    )

    # Define hyper parameters

    trainer = Trainer(max_epochs=config.train.n_epochs)

    # Train model

    trainer.fit(model, datamodule=data_module)

if __name__ == "__main__":

    __arg_parser = ArgumentParser()

    __arg_parser.add_argument("config", help="Path to YAML configuration file", type=str)

    __args = __arg_parser.parse_args()

    __config = OmegaConf.load(__args.config)

    train(__config)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jetbrains-research/code2seq

Awesome Lists containing this project

README