https://github.com/hiroakimikami/mlprogram

PyTorch library for synthesizing programs from natural language
https://github.com/hiroakimikami/mlprogram

deep-learning deep-neural-network deeplearning natural-language-understanding nl2code program-synthesis pytorch treegen

Last synced: about 1 year ago
JSON representation

PyTorch library for synthesizing programs from natural language

Host: GitHub
URL: https://github.com/hiroakimikami/mlprogram
Owner: HiroakiMikami
License: mit
Created: 2018-11-03T08:05:26.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2021-03-28T05:44:44.000Z (over 5 years ago)
Last Synced: 2023-08-07T00:36:50.224Z (almost 3 years ago)
Topics: deep-learning, deep-neural-network, deeplearning, natural-language-understanding, nl2code, program-synthesis, pytorch, treegen
Language: Python
Homepage:
Size: 9.04 MB
Stars: 17
Watchers: 3
Forks: 3
Open Issues: 28
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

          mlprogram

===

A Library of Deep Learning (Machine Learning) for Programming Tasks.

It provides a toolbox for implementing and evaluating deep learning methods related to programming.

Purpose

---

The main purpose of this repository is making my experiments easy. Recently, many papers proposed deep learning methods for programming, such as programming by example and auto reparing. But many of them requires complex and error-prone implementations. For example, beam search decoding with a programming language grammar is complex and unique to this task. I want to create and maintain well-tested implementations of such algorithms.

### Focuses

* Library for handling programming languages in deep learning tasks

* Utilities for benchmark datasets of various tasks

* Simple baseline solution for program generation

Now I do not place value on re-implementing exsiting papers.

The machine learning for programming field is still immature. There are no de-fact benchmark tasks in this field (such as image classification w/ ImageNet and object detection w/ COCO in the image). Also, there are no de-fact model (such as ResNet in the image). 

Feature Lists and Plans

---

* Benchmark dataset

    * Auto Reparing

        * DeepFix: [the official repository](https://bitbucket.org/iiscseal/deepfix/src/master/)

    * Program Synthesis from Natural Language

        * Hearthstone: [Latent Predictor Networks for Code Generation](https://arxiv.org/abs/1603.06744)

        * Django: [Learning to Generate Pseudo-code from Source Code Using Statistical Machine Translation, ACE2-15](https://ieeexplore.ieee.org/document/7372045)

        * NL2Bash: [nl2bash](https://github.com/TellinaTool/nl2bash)

        * (TODO) Spider: [Spider 1.0 Yale Semantic Parsing and Text-to-SQL Challenge](https://yale-lily.github.io/spider)

    * Programming by Examples

        * 2D CSG

        * (TODO) DeepCoder

        * (TODO) ShapeNet

* Deep Learning Models

    * Attention Based LSTM

    * AST LSTM (based on NL2Code)

* ProgramSynthesis Methods

    * supervised training

    * reinforcement learning for programming by example

    * (TODO) interpreter arppoximated by DNN

* Other Papers

    * [NL2Code](https://arxiv.org/abs/1704.01696): [the official repository](https://github.com/pcyin/NL2code/)

    * [TreeGen](https://arxiv.org/abs/1911.09983): [the official repository](https://github.com/zysszy/TreeGen)

    * [PbE with REPL](http://arxiv.org/abs/1906.04604): [the official repository](https://github.com/flxsosa/ProgramSearch)

Benchmark

---

### NL2Prog (Hearthstone)

|Method|#params [MiB]|training time [min]|max time per example [sec]|BLEU@top1|config name|

:-----|-------------:|------------------:|-------------------------:|--------:|:----------|

|tree LSTM|       7.7|                 92|                        15|  0.75020|`hearthstone/baseline_evaluate_short`|

|tree LSTM|       7.7|                 92|                       180|  0.76540|`hearthstone/baseline_evaluate_long`|

### Programming by Example without Inputs (CSG)

|Method                          |#params [MiB]|training time [min]|max time per example [sec]|generation rate|config file|

|:-------------------------------|------------:|------------------:|-------------------------:|---------------:|:----------|

|tree LSTM                       |16           |75                 |30                        |18/30|`csg/baseline_evaluate_short`|

|tree LSTM                       |16           |75                 |360                       |22/30|`csg/baseline_evaluate_long`|

|tree LSTM + REINFORCESynthesizer|16           |75                 |30                        |18/30|`csg/baseline_evaluate_rl_synthesizer_short`|

|tree LSTM + REINFORCESynthesizer|16           |75                 |360                       |22/30|`csg/baseline_evaluate_rl_synthesizer_short`|

### Auto Repair

TODO

Usage Examples

---

`tools/launch.py` is the launcher script and `configs` directory contains the examples.

### Train/Evaluate NL2Code with Hearthstone Dataset

It requires CUDA enabled GPU.

```bash

$ python tools/launch.py --config configs/nl2code/nl2code_train.py

$ python tools/launch.py --config configs/nl2code/nl2code_evaluate.py

```

Warning

---

* The implementation is highly experimental, and I may change it significantly.

* The reproduced algorithms may be different from the authors' implementations. For example, the original implementation of NL2Code uses the grammar of Python 2.7.x while this repository uses the grammar of running Python version.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hiroakimikami/mlprogram

Awesome Lists containing this project

README