{"id":19932184,"url":"https://github.com/amazon-science/tanl","last_synced_at":"2025-05-03T11:31:30.519Z","repository":{"id":47762913,"uuid":"344601437","full_name":"amazon-science/tanl","owner":"amazon-science","description":"Structured Prediction as Translation between Augmented Natural Languages","archived":false,"fork":false,"pushed_at":"2022-02-02T19:58:38.000Z","size":167,"stargazers_count":134,"open_issues_count":2,"forks_count":23,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-07T15:03:54.045Z","etag":null,"topics":["deep-learning","natural-language-processing","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-04T20:30:55.000Z","updated_at":"2025-03-12T20:49:44.000Z","dependencies_parsed_at":"2022-08-23T09:40:17.886Z","dependency_job_id":null,"html_url":"https://github.com/amazon-science/tanl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Ftanl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Ftanl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Ftanl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Ftanl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/tanl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252184233,"owners_count":21707912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","natural-language-processing","pytorch"],"created_at":"2024-11-12T23:09:18.924Z","updated_at":"2025-05-03T11:31:30.256Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TANL: Structured Prediction as Translation between Augmented Natural Languages\n\nCode for the paper \"[Structured Prediction as Translation between Augmented Natural Languages](http://arxiv.org/abs/2101.05779)\" (ICLR 2021) and [fine-tuned multi-task model](#fine-tuned-multi-task-model).\n\nIf you use this code, please cite the paper using the bibtex reference below.\n```\n@inproceedings{tanl,\n    title={Structured Prediction as Translation between Augmented Natural Languages},\n    author={Giovanni Paolini and Ben Athiwaratkun and Jason Krone and Jie Ma and Alessandro Achille and Rishita Anubhai and Cicero Nogueira dos Santos and Bing Xiang and Stefano Soatto},\n    booktitle={9th International Conference on Learning Representations, {ICLR} 2021},\n    year={2021},\n}\n```\n\n\n## Requirements\n\n- Python 3.6+\n- PyTorch (tested with version 1.7.1)\n- Transformers (tested with version 4.0.0)\n- NetworkX (tested with version 2.5, only used in coreference resolution)\n\nYou can install all required Python packages with `pip install -r requirements.txt`\n\n\n## Datasets\n\nBy default, datasets are expected to be in `data/DATASET_NAME`.\nDataset-specific code is in [datasets.py](datasets.py).\n\nThe CoNLL04 and ADE datasets (joint entity and relation extraction) in the correct format can be downloaded using https://github.com/markus-eberts/spert/blob/master/scripts/fetch_datasets.sh.\nFor other datasets, we provide sample processing code which does not necessarily match the format of publicly available versions (we do not plan to adapt the code to load datasets in other formats).\n\n\n\n## Running the code\n\nUse the following command:\n`python run.py JOB`\n\nThe `JOB` argument refers to a section of the config file, which by default is `config.ini`.\nA [sample config file](config.ini) is provided, with settings that allow for a faster training and less memory usage than the settings used to obtain the final results in the paper.\n\nFor example, to replicate the paper's results on CoNLL04, have the following section in the config file:\n```\n[conll04_final]\ndatasets = conll04\nmodel_name_or_path = t5-base\nnum_train_epochs = 200\nmax_seq_length = 256\nmax_seq_length_eval = 512\ntrain_split = train,dev\nper_device_train_batch_size = 8\nper_device_eval_batch_size = 16\ndo_train = True\ndo_eval = False\ndo_predict = True\nepisodes = 1-10\nnum_beams = 8\n```\nThen run `python run.py conll04_final`.\nNote that the final results will differ slightly from the ones reported in the paper, due to small code changes and randomness.\n\nConfig arguments can be overwritten by command line arguments.\nFor example: `python run.py conll04_final --num_train_epochs 50`.\n\n\n### Additional details\n\nIf `do_train = True`, the model is trained on the given train split (e.g., `'train'`) of the given datasets.\nThe final weights and intermediate checkpoints are written in a directory such as `experiments/conll04_final-t5-base-ep200-len256-b8-train`, with one subdirectory per episode.\nResults in JSON format are also going to be saved there.\n\nIn every episode, the model is trained on a different (random) permutation of the training set.\nThe random seed is given by the episode number, so that every episode always produces the same exact model.\n\nOnce a model is trained, it is possible to evaluate it without training again.\nFor this, set `do_train = False` or (more easily) provide the `-e` command-line argument: `python run.py conll04_final -e`.\n\nIf `do_eval = True`, the model is evaluated on the `'dev'` split.\nIf `do_predict = True`, the model is evaluated on the `'test'` split.\n\n\n### Arguments\n\nThe following are the most important command-line arguments for the `run.py` script.\nRun `python run.py -h` for the full list.\n\n- `-c CONFIG_FILE`: specify config file to use (default is `config.ini`)\n- `-e`: only run evaluation (overwrites the setting `do_train` in the config file)\n- `-a`: evaluate also intermediate checkpoints, in addition to the final model\n- `-v` : print results for each evaluation run\n- `-g GPU`: specify which GPU to use for evaluation\n\nThe following are the most important arguments for the config file. \nSee the [sample config file](config.ini) to understand the format.\n\n- `datasets` (str): comma-separated list of datasets for training\n- `eval_datasets` (str): comma-separated list of datasets for evaluation (default is the same as for training)\n- `model_name_or_path` (str): path to pretrained model or model identifier from [huggingface.co/models](https://huggingface.co/models) (e.g. `t5-base`)\n- `do_train` (bool): whether to run training (default is False)\n- `do_eval` (bool): whether to run evaluation on the `dev` set (default is False)\n- `do_predict` (bool): whether to run evaluation on the `test` set (default is False)\n- `train_split` (str): comma-separated list of data splits for training (default is `train`)\n- `num_train_epochs` (int): number of train epochs\n- `learning_rate` (float): initial learning rate (default is 5e-4)\n- `train_subset` (float \u003e 0 and \u003c=1): portion of training data to effectively use during training (default is 1, i.e., use all training data)\n- `per_device_train_batch_size` (int): batch size per GPU during training (default is 8)\n- `per_device_eval_batch_size` (int): batch size during evaluation (default is 8; only one GPU is used for evaluation)\n- `max_seq_length` (int): maximum input sequence length after tokenization; longer sequences are truncated\n- `max_output_seq_length` (int): maximum output sequence length (default is `max_seq_length`)\n- `max_seq_length_eval` (int): maximum input sequence length for evaluation (default is `max_seq_length`)\n- `max_output_seq_length_eval` (int): maximum output sequence length for evaluation (default is `max_output_seq_length` or `max_seq_length_eval` or `max_seq_length`)\n- `episodes` (str): episodes to run (default is `0`; an interval can be specified, such as `1-4`; the episode number is used as the random seed)\n- `num_beams` (int): number of beams for beam search during generation (default is 1)\n- `multitask` (bool): if True, the name of the dataset is prepended to each input sentence (default is False)\n\nSee [arguments.py](arguments.py) and [transformers.TrainingArguments](https://github.com/huggingface/transformers/blob/master/src/transformers/training_args.py) for additional config arguments.\n\n\n## Fine-tuned multi-task model\n\nThe weights of our multi-task model (released under the [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/)) can be downloaded here: https://tanl.s3.amazonaws.com/tanl-multitask.zip\n\nExtract the zip file in the `experiments/` directory. This will create a subdirectory called `multitask-t5-base-ep50-len512-b8-train,dev-overlap96`. For example, to test the multi-task model on the CoNLL04 dataset, run `python run.py multitask -e --eval_datasets conll04`.\n\nNote that: the `multitask` job is defined in [config.ini](config.ini); the `-e` flag is used to skip training and run evaluation only; the name of the subdirectory containing the weights is compatible with the definition of the `multitask` job.\n\nThe multi-task model was fine-tuned as described in the paper. The results differ slightly from what is reported in the paper due to small code changes.\n\n\n## Licenses\n\nThe code of this repository is released under the [Apache 2.0 license](LICENSE).\nThe weights of the [fine-tuned multi-task model](#fine-tuned-multi-task-model) are released under the [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Ftanl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Ftanl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Ftanl/lists"}