{"id":13752796,"url":"https://github.com/google-research/bigbird","last_synced_at":"2025-04-04T20:13:34.079Z","repository":{"id":37388397,"uuid":"319724040","full_name":"google-research/bigbird","owner":"google-research","description":"Transformers for Longer Sequences","archived":false,"fork":false,"pushed_at":"2022-09-01T03:29:33.000Z","size":1425,"stargazers_count":599,"open_issues_count":28,"forks_count":106,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-03-28T19:09:54.372Z","etag":null,"topics":["bert","deep-learning","longer-sequences","nlp","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2007.14062","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-08T18:19:35.000Z","updated_at":"2025-03-27T05:32:08.000Z","dependencies_parsed_at":"2022-07-07T22:29:40.232Z","dependency_job_id":null,"html_url":"https://github.com/google-research/bigbird","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbigbird","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbigbird/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbigbird/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbigbird/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/bigbird/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247242680,"owners_count":20907134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","deep-learning","longer-sequences","nlp","transformer"],"created_at":"2024-08-03T09:01:11.120Z","updated_at":"2025-04-04T20:13:34.059Z","avatar_url":"https://github.com/google-research.png","language":"Python","funding_links":[],"categories":["Transformer库与优化"],"sub_categories":[],"readme":"# Big Bird: Transformers for Longer Sequences\n\nNot an official Google product.\n\n# What is BigBird?\nBigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.\n\nAs a consequence of the capability to handle longer context,\nBigBird drastically improves performance on various NLP tasks such as question answering and summarization.\n\nMore details and comparisons can be found in our [presentation](https://docs.google.com/presentation/d/1FdMNqG2b8XYc89_v7-_2sba7Iz6YAlXXWuMxUbrKFK0/preview?resourcekey=0-KHcdpCx83g7a2JNz0h0-6w).\n\n\n# Citation\nIf you find this useful, please cite our [NeurIPS 2020 paper](https://papers.nips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html):\n```\n@article{zaheer2020bigbird,\n  title={Big bird: Transformers for longer sequences},\n  author={Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others},\n  journal={Advances in Neural Information Processing Systems},\n  volume={33},\n  year={2020}\n}\n```\n\n\n# Code\n\nThe most important directory is `core`.\nThere are three main files in `core`.\n\n*   [attention.py](bigbird/core/attention.py):\n    Contains BigBird linear attention mechanism\n*   [encoder.py](bigbird/core/encoder.py):\n    Contains the main long sequence encoder stack\n*   [modeling.py](bigbird/core/modeling.py):\n    Contains packaged BERT and seq2seq transformer models with BigBird attention\n\n\n### Colab/IPython Notebook\n\nA quick fine-tuning demonstration for text classification is provided in\n[imdb.ipynb](bigbird/classifier/imdb.ipynb)\n\n\n### Create GCP Instance\nPlease create a project first and create an instance in a zone which has quota as follows\n\n```bash\ngcloud compute instances create \\\n  bigbird \\\n  --zone=europe-west4-a \\\n  --machine-type=n1-standard-16 \\\n  --boot-disk-size=50GB \\\n  --image-project=ml-images \\\n  --image-family=tf-2-3-1 \\\n  --maintenance-policy TERMINATE \\\n  --restart-on-failure \\\n  --scopes=cloud-platform\n\ngcloud compute tpus create \\\n  bigbird \\\n  --zone=europe-west4-a \\\n  --accelerator-type=v3-32 \\\n  --version=2.3.1\n\ngcloud compute ssh --zone \"europe-west4-a\" \"bigbird\"\n\n```\n\nFor illustration we used instance name `bigbird` and zone `europe-west4-a`, but feel free to change them.\nMore details about creating Google Cloud TPU can be found in [online documentations](https://cloud.google.com/tpu/docs/creating-deleting-tpus#setup_TPU_only).\n\n\n### Instalation and checkpoints\n```bash\ngit clone https://github.com/google-research/bigbird.git\ncd bigbird\npip3 install -e .\n```\nYou can find pretrained and fine-tuned checkpoints in our [Google Cloud Storage Bucket](https://console.cloud.google.com/storage/browser/bigbird-transformer).\n\nOptionally, you can download them using `gsutil` as\n```bash\nmkdir -p bigbird/ckpt\ngsutil cp -r gs://bigbird-transformer/ bigbird/ckpt/\n```\n\nThe storage bucket contains:\n- pretrained BERT model for base(`bigbr_base`) and large (`bigbr_large`) size. It correspond to BERT/RoBERTa-like encoder only models. Following original BERT and RoBERTa implementation they are transformers with post-normalization, i.e. layer norm is happening after the attention layer. However, following [Rothe et al](https://arxiv.org/abs/1907.12461), we can use them partially in encoder-decoder fashion by coupling the encoder and decoder parameters, as illustrated in [bigbird/summarization/roberta_base.sh](bigbird/summarization/roberta_base.sh) launch script.\n- pretrained Pegasus Encoder-Decoder Transformer in large size(`bigbp_large`). Again following original implementation of Pegasus, they are transformers with pre-normalization. They have full set of separate encoder-decoder weights. Also for long document summarization datasets, we have converted Pegasus checkpoints (`model.ckpt-0`) for each dataset and also provided fine-tuned checkpoints (`model.ckpt-300000`) which works on longer documents.\n- fine-tuned `tf.SavedModel` for long document summarization which can be directly be used for prediction and evaluation as illustrated in the [colab nootebook](bigbird/summarization/eval.ipynb).\n\n\n### Running Classification\n\nFor quickly starting with BigBird, one can start by running the classification experiment code in `classifier` directory.\nTo run the code simply execute\n\n```shell\nexport GCP_PROJECT_NAME=bigbird-project  # Replace by your project name\nexport GCP_EXP_BUCKET=gs://bigbird-transformer-training/  # Replace\nsh -x bigbird/classifier/base_size.sh\n```\n\n\n## Using BigBird Encoder instead BERT/RoBERTa\n\nTo directly use the encoder instead of say BERT model, we can use the following\ncode.\n\n```python\nfrom bigbird.core import modeling\n\nbigb_encoder = modeling.BertModel(...)\n```\n\nIt can easily replace [BERT's](https://arxiv.org/abs/1810.04805) encoder.\n\n\nAlternatively, one can also try playing with layers of BigBird encoder\n\n```python\nfrom bigbird.core import encoder\n\nonly_layers = encoder.EncoderStack(...)\n```\n\n\n## Understanding Flags \u0026 Config\n\nAll the flags and config are explained in\n`core/flags.py`. Here we explain\nsome of the important config paramaters.\n\n`attention_type` is used to select the type of attention we would use. Setting\nit to `block_sparse` runs the BigBird attention module.\n\n```python\nflags.DEFINE_enum(\n    \"attention_type\", \"block_sparse\",\n    [\"original_full\", \"simulated_sparse\", \"block_sparse\"],\n    \"Selecting attention implementation. \"\n    \"'original_full': full attention from original bert. \"\n    \"'simulated_sparse': simulated sparse attention. \"\n    \"'block_sparse': blocked implementation of sparse attention.\")\n```\n\n`block_size` is used to define the size of blocks, whereas `num_rand_blocks` is\nused to set the number of random blocks. The code currently uses window size of\n3 blocks and 2 global blocks. The current code only supports static tensors.\n\nImportant points to note:\n* Hidden dimension should be divisible by the number of heads.\n* Currently the code only handles tensors of static shape as it is primarily designed\nfor TPUs which only works with statically shaped tensors.\n* For sequene length less than 1024, using `original_full` is advised as there\nis no benefit in using sparse BigBird attention.\n\n## Comparisons\nRecently, [Long Range Arena](https://arxiv.org/pdf/2011.04006.pdf) provided a benchmark of six tasks that require longer context, and performed experiments to benchmark all existing long range transformers. The results are shown below. BigBird model, unlike its counterparts, clearly reduces memory consumption without sacrificing performance.\n\n\u003cimg src=\"https://github.com/google-research/bigbird/blob/master/comparison.png\" width=\"50%\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fbigbird","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Fbigbird","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fbigbird/lists"}