{"id":13407987,"url":"https://github.com/facebookresearch/fairseq","last_synced_at":"2025-12-12T01:02:51.862Z","repository":{"id":37384389,"uuid":"101782647","full_name":"facebookresearch/fairseq","owner":"facebookresearch","description":"Facebook AI Research Sequence-to-Sequence Toolkit written in Python.","archived":false,"fork":false,"pushed_at":"2025-01-09T16:43:36.000Z","size":26079,"stargazers_count":31420,"open_issues_count":1321,"forks_count":6517,"subscribers_count":422,"default_branch":"main","last_synced_at":"2025-05-12T16:22:08.840Z","etag":null,"topics":["artificial-intelligence","python","pytorch"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-08-29T16:26:12.000Z","updated_at":"2025-05-12T13:58:22.000Z","dependencies_parsed_at":"2022-07-18T03:16:50.544Z","dependency_job_id":"bdb9a240-b726-499b-99e7-44c741659614","html_url":"https://github.com/facebookresearch/fairseq","commit_stats":{"total_commits":2309,"total_committers":456,"mean_commits":5.06359649122807,"dds":0.6881766998700736,"last_synced_commit":"ecbf110e1eb43861214b05fa001eff584954f65a"},"previous_names":["pytorch/fairseq"],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Ffairseq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Ffairseq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Ffairseq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Ffairseq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/fairseq/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253774636,"owners_count":21962207,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","python","pytorch"],"created_at":"2024-07-30T20:00:49.995Z","updated_at":"2025-12-12T01:02:46.842Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/fairseq_logo.png\" width=\"150\"\u003e\n  \u003cbr /\u003e\n  \u003cbr /\u003e\n  \u003ca href=\"https://opensource.fb.com/support-ukraine\"\u003e\u003cimg alt=\"Support Ukraine\" src=\"https://img.shields.io/badge/Support-Ukraine-FFD500?style=flat\u0026labelColor=005BBB\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/pytorch/fairseq/blob/main/LICENSE\"\u003e\u003cimg alt=\"MIT License\" src=\"https://img.shields.io/badge/license-MIT-blue.svg\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/pytorch/fairseq/releases\"\u003e\u003cimg alt=\"Latest Release\" src=\"https://img.shields.io/github/release/pytorch/fairseq.svg\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/pytorch/fairseq/actions?query=workflow:build\"\u003e\u003cimg alt=\"Build Status\" src=\"https://github.com/pytorch/fairseq/workflows/build/badge.svg\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://fairseq.readthedocs.io/en/latest/?badge=latest\"\u003e\u003cimg alt=\"Documentation Status\" src=\"https://readthedocs.org/projects/fairseq/badge/?version=latest\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://app.circleci.com/pipelines/github/facebookresearch/fairseq/\"\u003e\u003cimg alt=\"CicleCI Status\" src=\"https://circleci.com/gh/facebookresearch/fairseq.svg?style=shield\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n--------------------------------------------------------------------------------\n\nFairseq(-py) is a sequence modeling toolkit that allows researchers and\ndevelopers to train custom models for translation, summarization, language\nmodeling and other text generation tasks.\n\nWe provide reference implementations of various sequence modeling papers:\n\n\u003cdetails\u003e\u003csummary\u003eList of implemented papers\u003c/summary\u003e\u003cp\u003e\n\n* **Convolutional Neural Networks (CNN)**\n  + [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)\n  + [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)\n  + [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)\n  + [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)\n  + [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)\n* **LightConv and DynamicConv models**\n  + [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)\n* **Long Short-Term Memory (LSTM) networks**\n  + Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)\n* **Transformer (self-attention) networks**\n  + Attention Is All You Need (Vaswani et al., 2017)\n  + [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)\n  + [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)\n  + [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/README.adaptive_inputs.md)\n  + [Lexically constrained decoding with dynamic beam allocation (Post \u0026 Vilar, 2018)](examples/constrained_decoding/README.md)\n  + [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)](examples/truncated_bptt/README.md)\n  + [Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)](examples/adaptive_span/README.md)\n  + [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)\n  + [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)\n  + [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)\n  + [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )\n  + [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples/mbart/README.md)\n  + [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples/byte_level_bpe/README.md)\n  + [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md)\n  + [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples/wav2vec/README.md)\n  + [Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)](examples/pointer_generator/README.md)\n  + [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md)\n  + [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)\n  + [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md)\n  + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979)\n  + [Self-training and Pre-training are Complementary for Speech Recognition (Xu et al., 2020)](https://arxiv.org/abs/2010.11430)\n  + [Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)](https://arxiv.org/abs/2104.01027)\n  + [Unsupervised Speech Recognition (Baevski, et al., 2021)](https://arxiv.org/abs/2105.11084)\n  + [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition (Xu et al., 2021)](https://arxiv.org/abs/2109.11680)\n  + [VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding (Xu et. al., 2021)](https://arxiv.org/pdf/2109.14084.pdf)\n  + [VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding (Xu et. al., 2021)](https://aclanthology.org/2021.findings-acl.370.pdf)\n  + [NormFormer: Improved Transformer Pretraining with Extra Normalization (Shleifer et. al, 2021)](examples/normformer/README.md)\n* **Non-autoregressive Transformers**\n  + Non-Autoregressive Neural Machine Translation (Gu et al., 2017)\n  + Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)\n  + Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)\n  + Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)\n  + [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)\n* **Finetuning**\n  + [Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)](examples/rxf/README.md)\n\n\u003c/p\u003e\u003c/details\u003e\n\n### What's New:\n* May 2023 [Released models for Scaling Speech Technology to 1,000+ Languages  (Pratap, et al., 2023)](examples/mms/README.md)\n* June 2022 [Released code for wav2vec-U 2.0 from Towards End-to-end Unsupervised Speech Recognition (Liu, et al., 2022)](examples/wav2vec/unsupervised/README.md)\n* May 2022 [Integration with xFormers](https://github.com/facebookresearch/xformers)\n* December 2021 [Released Direct speech-to-speech translation code](examples/speech_to_speech/README.md)\n* October 2021 [Released VideoCLIP and VLM models](examples/MMPT/README.md)\n* October 2021 [Released multilingual finetuned XLSR-53 model](examples/wav2vec/README.md)\n* September 2021 [`master` branch renamed to `main`](https://github.com/github/renaming).\n* July 2021 [Released DrNMT code](examples/discriminative_reranking_nmt/README.md)\n* July 2021 [Released Robust wav2vec 2.0 model](examples/wav2vec/README.md)\n* June 2021 [Released XLMR-XL and XLMR-XXL models](examples/xlmr/README.md)\n* May 2021 [Released Unsupervised Speech Recognition code](examples/wav2vec/unsupervised/README.md)\n* March 2021 [Added full parameter and optimizer state sharding + CPU offloading](examples/fully_sharded_data_parallel/README.md)\n* February 2021 [Added LASER training code](examples/laser/README.md)\n* December 2020: [Added Adaptive Attention Span code](examples/adaptive_span/README.md)\n* December 2020: [GottBERT model and code released](examples/gottbert/README.md)\n* November 2020: Adopted the [Hydra](https://github.com/facebookresearch/hydra) configuration framework\n  * [see documentation explaining how to use it for new and existing projects](docs/hydra_integration.md)\n* November 2020: [fairseq 0.10.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.10.0)\n* October 2020: [Added R3F/R4F (Better Fine-Tuning) code](examples/rxf/README.md)\n* October 2020: [Deep Transformer with Latent Depth code released](examples/latent_depth/README.md)\n* October 2020: [Added CRISS models and code](examples/criss/README.md)\n\n\u003cdetails\u003e\u003csummary\u003ePrevious updates\u003c/summary\u003e\u003cp\u003e\n\n* September 2020: [Added Linformer code](examples/linformer/README.md)\n* September 2020: [Added pointer-generator networks](examples/pointer_generator/README.md)\n* August 2020: [Added lexically constrained decoding](examples/constrained_decoding/README.md)\n* August 2020: [wav2vec2 models and code released](examples/wav2vec/README.md)\n* July 2020: [Unsupervised Quality Estimation code released](examples/unsupervised_quality_estimation/README.md)\n* May 2020: [Follow fairseq on Twitter](https://twitter.com/fairseq)\n* April 2020: [Monotonic Multihead Attention code released](examples/simultaneous_translation/README.md)\n* April 2020: [Quant-Noise code released](examples/quant_noise/README.md)\n* April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples/megatron_11b/README.md)\n* March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md)\n* February 2020: [mBART model and code released](examples/mbart/README.md)\n* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/main/examples/backtranslation#training-your-own-model-wmt18-english-german)\n* December 2019: [fairseq 0.9.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.9.0)\n* November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example)\n* November 2019: [CamemBERT model and code released](examples/camembert/README.md)\n* November 2019: [BART model and code released](examples/bart/README.md)\n* November 2019: [XLM-R models and code released](examples/xlmr/README.md)\n* September 2019: [Nonautoregressive translation code released](examples/nonautoregressive_translation/README.md)\n* August 2019: [WMT'19 models released](examples/wmt19/README.md)\n* July 2019: fairseq relicensed under MIT license\n* July 2019: [RoBERTa models and code released](examples/roberta/README.md)\n* June 2019: [wav2vec models and code released](examples/wav2vec/README.md)\n\n\u003c/p\u003e\u003c/details\u003e\n\n### Features:\n\n* multi-GPU training on one machine or across multiple machines (data and model parallel)\n* fast generation on both CPU and GPU with multiple search algorithms implemented:\n  + beam search\n  + Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))\n  + sampling (unconstrained, top-k and top-p/nucleus)\n  + [lexically constrained decoding](examples/constrained_decoding/README.md) (Post \u0026 Vilar, 2018)\n* [gradient accumulation](https://fairseq.readthedocs.io/en/latest/getting_started.html#large-mini-batch-training-with-delayed-updates) enables training with large mini-batches even on a single GPU\n* [mixed precision training](https://fairseq.readthedocs.io/en/latest/getting_started.html#training-with-half-precision-floating-point-fp16) (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))\n* [extensible](https://fairseq.readthedocs.io/en/latest/overview.html): easily register new models, criterions, tasks, optimizers and learning rate schedulers\n* [flexible configuration](docs/hydra_integration.md) based on [Hydra](https://github.com/facebookresearch/hydra) allowing a combination of code, command-line and file based configuration\n* [full parameter and optimizer state sharding](examples/fully_sharded_data_parallel/README.md)\n* [offloading parameters to CPU](examples/fully_sharded_data_parallel/README.md)\n\nWe also provide [pre-trained models for translation and language modeling](#pre-trained-models-and-examples)\nwith a convenient `torch.hub` interface:\n\n``` python\nen2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')\nen2de.translate('Hello world', beam=5)\n# 'Hallo Welt'\n```\n\nSee the PyTorch Hub tutorials for [translation](https://pytorch.org/hub/pytorch_fairseq_translation/)\nand [RoBERTa](https://pytorch.org/hub/pytorch_fairseq_roberta/) for more examples.\n\n# Requirements and Installation\n\n* [PyTorch](http://pytorch.org/) version \u003e= 1.10.0\n* Python version \u003e= 3.8\n* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)\n* **To install fairseq** and develop locally:\n\n``` bash\ngit clone https://github.com/pytorch/fairseq\ncd fairseq\npip install --editable ./\n\n# on MacOS:\n# CFLAGS=\"-stdlib=libc++\" pip install --editable ./\n\n# to install the latest stable release (0.10.x)\n# pip install fairseq\n```\n\n* **For faster training** install NVIDIA's [apex](https://github.com/NVIDIA/apex) library:\n\n``` bash\ngit clone https://github.com/NVIDIA/apex\ncd apex\npip install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" \\\n  --global-option=\"--deprecated_fused_adam\" --global-option=\"--xentropy\" \\\n  --global-option=\"--fast_multihead_attn\" ./\n```\n\n* **For large datasets** install [PyArrow](https://arrow.apache.org/docs/python/install.html#using-pip): `pip install pyarrow`\n* If you use Docker make sure to increase the shared memory size either with `--ipc=host` or `--shm-size`\n as command line options to `nvidia-docker run` .\n\n# Getting Started\n\nThe [full documentation](https://fairseq.readthedocs.io/) contains instructions\nfor getting started, training new models and extending fairseq with new model\ntypes and tasks.\n\n# Pre-trained models and examples\n\nWe provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,\nas well as example training and evaluation commands.\n\n* [Translation](examples/translation/README.md): convolutional and transformer models are available\n* [Language Modeling](examples/language_model/README.md): convolutional and transformer models are available\n\nWe also have more detailed READMEs to reproduce results from specific papers:\n\n* [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale (Babu et al., 2021)](examples/wav2vec/xlsr/README.md)\n* [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)\n* [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples/wav2vec/README.md)\n* [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md)\n* [Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)](examples/quant_noise/README.md)\n* [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples/byte_level_bpe/README.md)\n* [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples/mbart/README.md)\n* [Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)](examples/layerdrop/README.md)\n* [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md)\n* [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)\n* [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)\n* [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)\n* [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)\n* [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)\n* [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)\n* [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)\n* [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)\n* [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)\n* [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)\n* [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)\n* [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/README.conv.md)\n\n# Join the fairseq community\n\n* Twitter: https://twitter.com/fairseq\n* Facebook page: https://www.facebook.com/groups/fairseq.users\n* Google group: https://groups.google.com/forum/#!forum/fairseq-users\n\n# License\n\nfairseq(-py) is MIT-licensed.\nThe license applies to the pre-trained models as well.\n\n# Citation\n\nPlease cite as:\n\n``` bibtex\n@inproceedings{ott2019fairseq,\n  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},\n  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},\n  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},\n  year = {2019},\n}\n```\n","funding_links":[],"categories":["Toolkits","Python","Natural Language Processing","General","Lua","自然语言处理 (NLP)","NLP","Repos","Model Training and Orchestration","文本数据和NLP","Related Project","🤖Generative AI","Neuronale Übersetzungstools","Frameworks and libraries"],"sub_categories":["Benchmark","General Purpose NLP","2. Seq2Seq","Project of Self-supervised Learning","🔧 Framework","Open-Source-Übersetzungsmodelle",":snake: Python"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Ffairseq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2Ffairseq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Ffairseq/lists"}