{"id":20665516,"url":"https://github.com/teelinsan/parallel-decoding","last_synced_at":"2025-04-19T16:37:13.987Z","repository":{"id":181219245,"uuid":"636199466","full_name":"teelinsan/parallel-decoding","owner":"teelinsan","description":"Repository of the paper \"Accelerating Transformer Inference for Translation via Parallel Decoding\"","archived":false,"fork":false,"pushed_at":"2024-03-15T07:54:50.000Z","size":238,"stargazers_count":87,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-03-15T08:51:25.584Z","etag":null,"topics":["decoding-algorithm","deep-learning","jacobi-decoding","jacobi-iteration","natural-language-processing","neural-network","parallel-decoding","transformers"],"latest_commit_sha":null,"homepage":"https://gladia.di.uniroma1.it/publication/ipi/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/teelinsan.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-04T10:24:15.000Z","updated_at":"2024-03-12T19:26:03.000Z","dependencies_parsed_at":"2023-11-23T10:25:00.873Z","dependency_job_id":"c972230d-c8c6-4d45-9e5f-de7d34d981a5","html_url":"https://github.com/teelinsan/parallel-decoding","commit_stats":null,"previous_names":["teelinsan/parallel-decoding"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teelinsan%2Fparallel-decoding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teelinsan%2Fparallel-decoding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teelinsan%2Fparallel-decoding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teelinsan%2Fparallel-decoding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/teelinsan","download_url":"https://codeload.github.com/teelinsan/parallel-decoding/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224962624,"owners_count":17399287,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decoding-algorithm","deep-learning","jacobi-decoding","jacobi-iteration","natural-language-processing","neural-network","parallel-decoding","transformers"],"created_at":"2024-11-16T19:31:53.435Z","updated_at":"2024-11-16T19:32:12.757Z","avatar_url":"https://github.com/teelinsan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e    \n \n# Accelerating Transformer Inference for Translation via Parallel Decoding\n\n[![Paper](http://img.shields.io/badge/paper-ArXiv-B31B1B.svg)](https://arxiv.org/abs/2305.10427)\n[![Conference](http://img.shields.io/badge/ACL-2023-c92828.svg)](https://aclanthology.org/2023.acl-long.689/)\n\n\u003c/div\u003e\n\n\nThis is the code repository of the paper \"Accelerating Transformer Inference for Translation via Parallel Decoding\" accepted at [ACL 2023 main conference](https://aclanthology.org/2023.acl-long.689/).\n\nThe paper proposes three Parallel Decoding methods to speed up existing autoregressive machine translation models: **Jacobi Decoding**, **GS-Jacobi Decoding**, and **Hybrid GS-Jacobi Decoding**.\n\nThis code is not production-ready and should be used just for research purposes.\n\n**Paper**: https://arxiv.org/abs/2305.10427\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"assets/ipi.png\" alt=\"drawing\" width=\"1000\"/\u003e\n\u003c/div\u003e\n\n\n## Reproduce the results\nTo produce the benchmark values for en-ro Wmt16, do:\n1. install all the requirements with `pip install -r requirements.txt`\n2. run the following to retrieve the benchmark values:\n    ```\n    python3 main.py src_lang=\"en\" tgt_lang=\"ro\" device=\"cpu\" dataset.name=\"wmt\" dataset.version=\"16\" task=\"benchmark\" bench.result_dir=\"[your path]\" model.model_name=\"Helsinki-NLP/opus-mt-en-ro\"\n    ```\nIters speedup and BLEU results should be easy to reproduce. Time speedups depend on the availability of the underlying hardware and software to run computation in parallel without introducing overheads. Please follow the experimental setting proposed in the paper. The easiest way is to use a virgin virtual machine, we provide the instructions in the Scaling Experiments in this readme.\n\n## Datasets\nAll the datasets are available via `HuggingFace` datasets, so they will be downloaded automatically.\nHowever, `Iwslt` needs to be downloaded manually. In particular, you have to download `2015-01/texts` for `Iwslt15` and `\n2017-01-trnted/texts` for `Iwslt17`. Once downloaded, you should specify the path as parameter in the Python command, by adding `dataset.data_dir=[your path]` (it is possible also to modify it manually in `conf/config.yaml`).\n\n## Table 1 and Table 5 Experiments - Parallel Decoding Algorithms\nTo reproduce results in Table 1 run the command.\n```\n    /bin/bash ./exp/tab1.sh\n```\nPlease modify beforehand the result_dir path in `tab1.sh` or in the config file `conf/config.yaml`.\n\n## Table 2 and Table 6 Experiments - Cross Languages\nTo reproduce results in Table 2 run the command.\n```\n    /bin/bash ./exp/tab2.sh\n```\nPlease modify beforehand the result_dir path in  `tab2.sh` or in the config file `conf/config.yaml`.\n\n## Figure 3 and Figure 5 - Scaling Experiments\nTo reproduce the scaling experiments you need to use Google Clouds with `c2d-standard-XX`, where XX is the number of used cores. Then you need to run the command as specified in section \"Reproduce Results\" of this README.\nTo ease the process we provide the command to launch the virtual machine using gcloud-cli.\n\n```\ngcloud compute instances create instance-1 --zone=us-central1-a --machine-type=c2d-standard-8 --network-interface=network-tier=PREMIUM,subnet=default --maintenance-policy=MIGRATE --provisioning-model=STANDARD --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --create-disk=boot=yes,device-name=instance-1,image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20221015,mode=rw,size=30 --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --reservation-affinity=any\n```\n## Table 3 Experiments - FLOPs calculator\n\nTo reproduce the FLOPs calculation in Table 3 simply run the script:\n```\n    python3 ./exp/flops_calculator.py \n```\n\n## Dependency Graph Visualizer (DDGviz)\n\nTo run the Dependency Graph Visualizer (DDGviz) execute the command:\n```\n    PYTHONPATH=. python3 ./src/viz/visualize.py\n```\nIt is possible to select the examples to visualize with the param `--examples [list of id in the dataset]`. The dataset and source/target language can be selected with the corresponding commands, please use the param `--help` for more info.\nThe output DDGviz visualization will be saved in `iteration_matrix/\u003cdataset_hash\u003e/images`.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"assets/ddg.png\" alt=\"drawing\" width=\"500\"/\u003e\n\u003c/div\u003e\n\n\n## Citation\n\nIf you use this code please cite:\n\n```bibtex\n@inproceedings{santilli-etal-2023-accelerating,\n    title = \"Accelerating Transformer Inference for Translation via Parallel Decoding\",\n    author = \"Santilli, Andrea  and\n      Severino, Silvio  and\n      Postolache, Emilian  and\n      Maiorca, Valentino  and\n      Mancusi, Michele  and\n      Marin, Riccardo  and\n      Rodola, Emanuele\",\n    booktitle = \"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    month = jul,\n    year = \"2023\",\n    address = \"Toronto, Canada\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.acl-long.689\",\n    pages = \"12336--12355\",\n    abstract = \"Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network architectures and learning-based methods to solve this issue, which are expensive and require changes to the MT model, trading inference speed at the cost of the translation quality. In this paper, we propose to address the problem from the point of view of decoding algorithms, as a less explored but rather compelling direction. We propose to reframe the standard greedy autoregressive decoding of MT with a parallel formulation leveraging Jacobi and Gauss-Seidel fixed-point iteration methods for fast inference. This formulation allows to speed up existing models without training or modifications while retaining translation quality. We present three parallel decoding algorithms and test them on different languages and models showing how the parallelization introduces a speedup up to 38{\\%} w.r.t. the standard autoregressive decoding and nearly 2x when scaling the method on parallel resources. Finally, we introduce a decoding dependency graph visualizer (DDGviz) that let us see how the model has learned the conditional dependence between tokens and inspect the decoding procedure.\",\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteelinsan%2Fparallel-decoding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteelinsan%2Fparallel-decoding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteelinsan%2Fparallel-decoding/lists"}