https://github.com/um-arm-lab/efficient-eng-2-ltl

The associated repo for paper "Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification"
https://github.com/um-arm-lab/efficient-eng-2-ltl

Last synced: about 1 year ago
JSON representation

The associated repo for paper "Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification"

Host: GitHub
URL: https://github.com/um-arm-lab/efficient-eng-2-ltl
Owner: UM-ARM-Lab
Created: 2022-09-15T19:37:58.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-08-16T05:02:43.000Z (almost 3 years ago)
Last Synced: 2025-05-06T19:27:45.843Z (about 1 year ago)
Language: Python
Homepage: https://um-arm-lab.github.io/Efficient-Eng-2-LTL/
Size: 1.89 MB
Stars: 24
Watchers: 10
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification

![Demo](https://i.imgur.com/ynNTrpf.png)

[[Homepage](https://um-arm-lab.github.io/Efficient-Eng-2-LTL/)] [[Paper](https://arxiv.org/abs/2303.08006)] [[Video](https://drive.google.com/file/d/14Sy5y76YglZ6X3Y3ZZBZZiMGBA9gME9G/view?usp=sharing)] [[Poster](https://drive.google.com/file/d/1j0aZoROb1EKC0oRYYBSwBIx4Xp8ElowN/view?usp=sharing)]

> The associated repo for paper "Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification".

## Repo Structure

- Root
- datasets
- [drone-planning](https://arxiv.org/abs/1905.12096)
- [clean-up](http://www.roboticsproceedings.org/rss14/p67.html)
- [pick-and-place](http://www.roboticsproceedings.org/rss14/p67.html)
- augmentation
- paraphrase with GPT-3
- run
- train the models
- inference with constrained decoding

The constrained decoding inference code is based on: [microsoft/semantic_parsing_with_constrained_lm](https://github.com/microsoft/semantic_parsing_with_constrained_lm)

## Reproduce the Results

Following are the instructions for reproducing the result for our model. For baselines, come and check out [this link](https://github.com/UM-ARM-Lab/Efficient-Eng-2-LTL/issues/1).
### Environment Setup

Install the dependencies:

```bash
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 # make sure the version is compatible with your cuda version
pip install transformers datasets
pip install sentencepiece
pip install jsons appdirs blobfile cached-property httpx typer whoosh more_itertools
pip install --upgrade protobuf==3.20.0
```

Download BART-large model:

```bash
cd ./run
python ./semantic_parsing_with_constrained_lm/finetune/download_huggingface_lms.py
```

### Prepare the Dataset

The processed dataset (with augmentation from LLM) in already included in the repo. This step is only needed if you want to reprocess the dataset.

To actually process the raw dataset, you can follow the steps below:

1. Pre-process: In each of the three dataset folders, run all cells in "preprocess.ipynb" to generate the processed dataset. （the annotation result is included in the notebook).
2. Augmentation: For each of the three datasets, run all commands in "augment.ipynb" to generate the augmented dataset. Note that this step requires a GPT-3 API key.
3. Move to training folder: You then need to reformat the dataset and move it to the `run/semantic_parsing_with_constrained_lm/domains/ltl/data` folder. A script will be provided later to help you automate this process.

### Train

In our paper, we use the [BART-large model](https://huggingface.co/facebook/bart-large) because it is efficient to fine-tune on a single GPU. Our proposed method can be easily applied to other potentially stronger language models like [T5-XXL](https://arxiv.org/abs/1910.10683) or [GPT-3](https://arxiv.org/abs/2005.14165).

```sh
export PRETRAINED_MODEL_DIR=huggingface_models/bart-large
export TRAINED_MODEL_DIR=trained_models/

cd ./run
DOMAIN=TODO # for example, DOMAIN=pick-syn-aug
python -m semantic_parsing_with_constrained_lm.finetune.lm_finetune \
--config-name semantic_parsing_with_constrained_lm.finetune.configs.emnlp_train_config \
--exp-names ltl_${DOMAIN}_utterance
```

Here DOMAIN determines which experiment to run.
DOMAIN: {dataset_name}-{experiment_name}

- dataset_name: {drone, cleanup, pick}
- experiment_name:
- syn-aug: synthetic with augmentation
- syn: synthetic without augmentation
- golden-cross0-split{0,1,2,3,4}: golden dataset with cross-validation

### Inference

```sh
export PRETRAINED_MODEL_DIR=huggingface_models/bart-large
export TRAINED_MODEL_DIR=trained_models/

DOMAIN=TODO

python -m semantic_parsing_with_constrained_lm.run_exp \
--config-name semantic_parsing_with_constrained_lm.configs.ltl_config \
--log-dir logs/ \
--model Bart \
--eval-split test-full \
--exp-names "ltl_Bart_test-full_${DOMAIN}_constrained_utterance_train-0"
```

The domain name is the same as the training step.

## Cite
```bibtex
@article{pan2023data,
title={Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification},
author={Pan, Jiayi and Chou, Glen and Berenson, Dmitry},
journal={arXiv preprint arXiv:2303.08006},
year={2023}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/um-arm-lab/efficient-eng-2-ltl

Awesome Lists containing this project

README