{"id":13754269,"url":"https://github.com/google-research/tapas","last_synced_at":"2025-05-15T09:08:46.108Z","repository":{"id":37741304,"uuid":"251642256","full_name":"google-research/tapas","owner":"google-research","description":"End-to-end neural table-text understanding models.","archived":false,"fork":false,"pushed_at":"2024-07-22T13:51:30.000Z","size":646,"stargazers_count":1171,"open_issues_count":58,"forks_count":220,"subscribers_count":42,"default_branch":"master","last_synced_at":"2025-04-11T19:55:40.270Z","etag":null,"topics":["nlp-machine-learning","question-answering","table-parsing","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-31T15:14:39.000Z","updated_at":"2025-04-11T09:16:06.000Z","dependencies_parsed_at":"2024-06-19T03:03:10.632Z","dependency_job_id":"946fe7fa-85e5-4e3f-bb25-8cf53fd29be2","html_url":"https://github.com/google-research/tapas","commit_stats":{"total_commits":60,"total_committers":11,"mean_commits":5.454545454545454,"dds":0.5666666666666667,"last_synced_commit":"569a3c31451d941165bd10783f73f494406b3906"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Ftapas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Ftapas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Ftapas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Ftapas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/tapas/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254310520,"owners_count":22049470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp-machine-learning","question-answering","table-parsing","tensorflow"],"created_at":"2024-08-03T09:01:52.685Z","updated_at":"2025-05-15T09:08:41.097Z","avatar_url":"https://github.com/google-research.png","language":"Python","readme":"# TAble PArSing (TAPAS)\n\nCode and checkpoints for training the transformer-based Table QA models introduced\nin the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](#how-to-cite-tapas).\n\n## News\n\n#### 2021/09/15\n* Released code for sparse table attention from [MATE: Multi-view Attention for Table Transformer Efficiency](https://arxiv.org/abs/2109.04312). For more info check [here](https://github.com/google-research/tapas/blob/master/MATE.md).\n\n#### 2021/08/24\n* Added a [colab](http://tiny.cc/tapas-retrieval-colab) to try predictions on open domain question answering.\n\n#### 2021/08/20\n* New models and code for [DoT: An efficient Double Transformer for NLP tasks with tables](https://arxiv.org/abs/2106.00479) released\n[here](https://github.com/google-research/tapas/blob/master/DOT.md).\n\n#### 2021/07/23\n* New release of NQ with tables data used in [Open Domain Question Answering over Tables via Dense Retrieval](https://arxiv.org/abs/2103.12011). The use of the data is detailed [here](https://github.com/google-research/tapas/blob/master/DENSE_TABLE_RETRIEVER.md).\n\n#### 2021/05/13\n * New models and code for [Open Domain Question Answering over Tables via Dense Retrieval](https://arxiv.org/abs/2103.12011) released\n[here](https://github.com/google-research/tapas/blob/master/DENSE_TABLE_RETRIEVER.md).\n\n#### 2021/03/23\n\n * The upcoming NAACL 2021 short paper [Open Domain Question Answering over Tables via Dense Retrieval](https://arxiv.org/abs/2103.12011) extends the TAPAS capabilities to table retrieval and open-domain QA. We are planning to release the new models and code soon.\n\n#### 2020/12/17\n\n * TAPAS is added to [huggingface/transformers](https://github.com/huggingface/transformers) in version 4.1.1. 28 checkpoints are added to the [huggingface model hub](https://huggingface.co/models?filter=tapas) and can be played with using a [custom table question answering widget](https://huggingface.co/google/tapas-base-finetuned-wtq).\n\n#### 2020/10/19\n * Small change to WTQ training example creation\n   * Questions with ambiguous cell matches will now be discarded\n   * This improves denotation accuracy by ~1 point\n   * For more details see [this issue](https://github.com/google-research/tapas/issues/73).\n * Added option to filter table columns by textual overlap with question\n   * Based on the **HEM** method described in section 3.3 of\n [Understanding tables with intermediate pre-training](https://www.aclweb.org/anthology/2020.findings-emnlp.27/).\n\n#### 2020/10/09\n * Released code \u0026 models to run TAPAS on [TabFact](https://tabfact.github.io/) for table entailment, companion for the EMNLP 2020 Findings paper [Understanding tables with intermediate pre-training](https://www.aclweb.org/anthology/2020.findings-emnlp.27/).\n * Added a [colab](http://tiny.cc/tapas-tabfact-colab) to try predictions on TabFact\n * Added [new page](https://github.com/google-research/tapas/blob/master/INTERMEDIATE_PRETRAIN_DATA.md) describing the intermediate pre-training process.\n\n#### 2020/08/26\n * Added a [colab](http://tiny.cc/tapas-wtq-colab) to try predictions on WTQ\n\n#### 2020/08/05\n * New pre-trained models (see Data section below)\n * `reset_position_index_per_cell`: New option that allows to train models that instead of using absolute position indices reset the position index when a new cell starts.\n\n#### 2020/06/10\n * Bump TensorFlow to v2.2\n\n#### 2020/06/08\n * Released the [pre-training data](https://github.com/google-research/tapas/blob/master/PRETRAIN_DATA.md).\n\n#### 2020/05/07\n * Added a [colab](http://tiny.cc/tapas-colab) to try predictions on SQA\n\n## Installation\n\nThe easiest way to try out TAPAS with free GPU/TPU is in our\n[Colab](http://tiny.cc/tapas-colab), which shows how to do predictions on [SQA](http://aka.ms/sqa).\n\nThe repository uses protocol buffers, and requires the `protoc` compiler to run.\nYou can download the latest binary for your OS [here](https://github.com/protocolbuffers/protobuf/releases).\nOn Ubuntu/Debian, it can be installed with:\n\n```bash\nsudo apt-get install protobuf-compiler\n```\n\nAfterwards, clone and install the git repository:\n\n```bash\ngit clone https://github.com/google-research/tapas\ncd tapas\npip install -e .\n```\n\nTo run the test suite we use the [tox](https://tox.readthedocs.io/en/latest/) library which can be run by calling:\n```bash\npip install tox\ntox\n```\n\n## Models\n\nWe provide pre-trained models for different model sizes.\n\nThe metrics are computed by our tool and not the official metrics of the\nrespective tasks. We provide them so one can verify whether one's own runs\nare in the right ballpark. They are medians over three individual runs.\n\n\n### Models with intermediate pre-training (2020/10/07).\n\nNew models based on the ideas discussed in [Understanding tables with intermediate pre-training](https://www.aclweb.org/anthology/2020.findings-emnlp.27/). Learn more about the methods use [here](https://github.com/google-research/tapas/blob/master/INTERMEDIATE_PRETRAIN_DATA.md).\n\n#### WTQ\n\nTrained from Mask LM, intermediate data, SQA, WikiSQL.\n\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.5062 | [tapas_wtq_wikisql_sqa_inter_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_large.zip)\nLARGE | reset | 0.5097 | [tapas_wtq_wikisql_sqa_inter_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_large_reset.zip)\nBASE | noreset | 0.4525 | [tapas_wtq_wikisql_sqa_inter_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_base.zip)\nBASE | reset | 0.4638 | [tapas_wtq_wikisql_sqa_inter_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_base_reset.zip)\nMEDIUM | noreset | 0.4324 | [tapas_wtq_wikisql_sqa_inter_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_medium.zip)\nMEDIUM | reset | 0.4324 | [tapas_wtq_wikisql_sqa_inter_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_medium_reset.zip)\nSMALL | noreset | 0.3681 | [tapas_wtq_wikisql_sqa_inter_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_small.zip)\nSMALL | reset | 0.3762 | [tapas_wtq_wikisql_sqa_inter_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_small_reset.zip)\nMINI | noreset | 0.2783 | [tapas_wtq_wikisql_sqa_inter_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_mini.zip)\nMINI | reset | 0.2854 | [tapas_wtq_wikisql_sqa_inter_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_mini_reset.zip)\nTINY | noreset | 0.0823 | [tapas_wtq_wikisql_sqa_inter_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_tiny.zip)\nTINY | reset | 0.1039 | [tapas_wtq_wikisql_sqa_inter_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wtq_wikisql_sqa_inter_masklm_tiny_reset.zip)\n\n#### WIKISQL\n\nTrained from Mask LM, intermediate data, SQA.\n\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.8948 | [tapas_wikisql_sqa_inter_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_large.zip)\nLARGE | reset | 0.8979 | [tapas_wikisql_sqa_inter_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_large_reset.zip)\nBASE | noreset | 0.8859 | [tapas_wikisql_sqa_inter_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_base.zip)\nBASE | reset | 0.8855 | [tapas_wikisql_sqa_inter_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_base_reset.zip)\nMEDIUM | noreset | 0.8766 | [tapas_wikisql_sqa_inter_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_medium.zip)\nMEDIUM | reset | 0.8773 | [tapas_wikisql_sqa_inter_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_medium_reset.zip)\nSMALL | noreset | 0.8552 | [tapas_wikisql_sqa_inter_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_small.zip)\nSMALL | reset | 0.8615 | [tapas_wikisql_sqa_inter_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_small_reset.zip)\nMINI | noreset | 0.8063 | [tapas_wikisql_sqa_inter_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_mini.zip)\nMINI | reset | 0.82 | [tapas_wikisql_sqa_inter_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_mini_reset.zip)\nTINY | noreset | 0.3198 | [tapas_wikisql_sqa_inter_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_tiny.zip)\nTINY | reset | 0.6046 | [tapas_wikisql_sqa_inter_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_wikisql_sqa_inter_masklm_tiny_reset.zip)\n\n#### TABFACT\n\nTrained from Mask LM, intermediate data.\n\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.8101 | [tapas_tabfact_inter_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_large.zip)\nLARGE | reset | 0.8159 | [tapas_tabfact_inter_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_large_reset.zip)\nBASE | noreset | 0.7856 | [tapas_tabfact_inter_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_base.zip)\nBASE | reset | 0.7918 | [tapas_tabfact_inter_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_base_reset.zip)\nMEDIUM | noreset | 0.7585 | [tapas_tabfact_inter_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_medium.zip)\nMEDIUM | reset | 0.7587 | [tapas_tabfact_inter_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_medium_reset.zip)\nSMALL | noreset | 0.7321 | [tapas_tabfact_inter_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_small.zip)\nSMALL | reset | 0.7346 | [tapas_tabfact_inter_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_small_reset.zip)\nMINI | noreset | 0.6166 | [tapas_tabfact_inter_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_mini.zip)\nMINI | reset | 0.6845 | [tapas_tabfact_inter_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_mini_reset.zip)\nTINY | noreset | 0.5425 | [tapas_tabfact_inter_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_tiny.zip)\nTINY | reset | 0.5528 | [tapas_tabfact_inter_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_tabfact_inter_masklm_tiny_reset.zip)\n\n#### SQA\n\nTrained from Mask LM, intermediate data.\n\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.7223 | [tapas_sqa_inter_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_large.zip)\nLARGE | reset | 0.7289 | [tapas_sqa_inter_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_large_reset.zip)\nBASE | noreset | 0.6737 | [tapas_sqa_inter_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_base.zip)\nBASE | reset | 0.6874 | [tapas_sqa_inter_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_base_reset.zip)\nMEDIUM | noreset | 0.6464 | [tapas_sqa_inter_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_medium.zip)\nMEDIUM | reset | 0.6561 | [tapas_sqa_inter_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_medium_reset.zip)\nSMALL | noreset | 0.5876 | [tapas_sqa_inter_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_small.zip)\nSMALL | reset | 0.6155 | [tapas_sqa_inter_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_small_reset.zip)\nMINI | noreset | 0.4574 | [tapas_sqa_inter_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_mini.zip)\nMINI | reset | 0.5148 | [tapas_sqa_inter_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_mini_reset.zip)\nTINY | noreset | 0.2004 | [tapas_sqa_inter_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_tiny.zip)\nTINY | reset | 0.2375 | [tapas_sqa_inter_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_sqa_inter_masklm_tiny_reset.zip)\n\n#### INTERMEDIATE\n\nTrained from Mask LM.\n\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.9309 | [tapas_inter_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_large.zip)\nLARGE | reset | 0.9317 | [tapas_inter_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_large_reset.zip)\nBASE | noreset | 0.9134 | [tapas_inter_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_base.zip)\nBASE | reset | 0.9163 | [tapas_inter_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_base_reset.zip)\nMEDIUM | noreset | 0.8988 | [tapas_inter_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_medium.zip)\nMEDIUM | reset | 0.9005 | [tapas_inter_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_medium_reset.zip)\nSMALL | noreset | 0.8788 | [tapas_inter_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_small.zip)\nSMALL | reset | 0.8798 | [tapas_inter_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_small_reset.zip)\nMINI | noreset | 0.8218 | [tapas_inter_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_mini.zip)\nMINI | reset | 0.8333 | [tapas_inter_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_mini_reset.zip)\nTINY | noreset | 0.6359 | [tapas_inter_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_tiny.zip)\nTINY | reset | 0.6615 | [tapas_inter_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_10_07/tapas_inter_masklm_tiny_reset.zip)\n\n\n### Small Models \u0026 position index reset (2020/08/08)\n\nBased on the pre-trained checkpoints available at the [BERT github page](https://github.com/google-research/bert/blob/master/README.md).\nSee the page or the [paper](https://arxiv.org/abs/1908.08962) for detailed information on the model dimensions.\n\n**Reset** refers to whether the parameter `reset_position_index_per_cell` was\nset to true or false during training. In general it's recommended to set it to true.\n\nThe accuracy depends on the respective task. It's denotation accuracy for\nWTQ and WIKISQL, average position accuracy with gold labels for the previous answers for SQA and Mask-LM accuracy for Mask-LM.\n\nThe models were trained in a chain as indicated by the model name.\nFor example, *sqa_masklm* means the model was first trained on the Mask-LM task and then on SQA. No destillation was performed.\n\n#### WTQ\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.4822 | [tapas_wtq_wikisql_sqa_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_large.zip)\nLARGE | reset | 0.4952 | [tapas_wtq_wikisql_sqa_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_large_reset.zip)\nBASE | noreset | 0.4288 | [tapas_wtq_wikisql_sqa_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_base.zip)\nBASE | reset | 0.4433 | [tapas_wtq_wikisql_sqa_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_base_reset.zip)\nMEDIUM | noreset | 0.4158 | [tapas_wtq_wikisql_sqa_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_medium.zip)\nMEDIUM | reset | 0.4097 | [tapas_wtq_wikisql_sqa_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_medium_reset.zip)\nSMALL | noreset | 0.3267 | [tapas_wtq_wikisql_sqa_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_small.zip)\nSMALL | reset | 0.3670 | [tapas_wtq_wikisql_sqa_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_small_reset.zip)\nMINI | noreset | 0.2275 | [tapas_wtq_wikisql_sqa_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_mini.zip)\nMINI | reset | 0.2409 | [tapas_wtq_wikisql_sqa_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_mini_reset.zip)\nTINY | noreset | 0.0901 | [tapas_wtq_wikisql_sqa_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_tiny.zip)\nTINY | reset | 0.0947 | [tapas_wtq_wikisql_sqa_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_tiny_reset.zip)\n\n#### WIKISQL\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.8862 | [tapas_wikisql_sqa_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_large.zip)\nLARGE | reset | 0.8917 | [tapas_wikisql_sqa_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_large_reset.zip)\nBASE | noreset | 0.8772 | [tapas_wikisql_sqa_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_base.zip)\nBASE | reset | 0.8809 | [tapas_wikisql_sqa_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_base_reset.zip)\nMEDIUM | noreset | 0.8687 | [tapas_wikisql_sqa_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_medium.zip)\nMEDIUM | reset | 0.8736 | [tapas_wikisql_sqa_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_medium_reset.zip)\nSMALL | noreset | 0.8285 | [tapas_wikisql_sqa_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_small.zip)\nSMALL | reset | 0.8550 | [tapas_wikisql_sqa_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_small_reset.zip)\nMINI | noreset | 0.7672 | [tapas_wikisql_sqa_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_mini.zip)\nMINI | reset | 0.7944 | [tapas_wikisql_sqa_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_mini_reset.zip)\nTINY | noreset | 0.3237 | [tapas_wikisql_sqa_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_tiny.zip)\nTINY | reset | 0.3608 | [tapas_wikisql_sqa_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_wikisql_sqa_masklm_tiny_reset.zip)\n\n#### SQA\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.7002 | [tapas_sqa_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_large.zip)\nLARGE | reset | 0.7130 | [tapas_sqa_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_large_reset.zip)\nBASE | noreset | 0.6393 | [tapas_sqa_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_base.zip)\nBASE | reset | 0.6689 | [tapas_sqa_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_base_reset.zip)\nMEDIUM | noreset | 0.6026 | [tapas_sqa_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_medium.zip)\nMEDIUM | reset | 0.6141 | [tapas_sqa_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_medium_reset.zip)\nSMALL | noreset | 0.4976 | [tapas_sqa_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_small.zip)\nSMALL | reset | 0.5589 | [tapas_sqa_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_small_reset.zip)\nMINI | noreset | 0.3779 | [tapas_sqa_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_mini.zip)\nMINI | reset | 0.3687 | [tapas_sqa_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_mini_reset.zip)\nTINY | noreset | 0.2013 | [tapas_sqa_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_tiny.zip)\nTINY | reset | 0.2194 | [tapas_sqa_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_sqa_masklm_tiny_reset.zip)\n\n#### MASKLM\nSize     |  Reset  | Dev Accuracy | Link\n-------- | --------| -------- | ----\nLARGE | noreset | 0.7513 | [tapas_masklm_large.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_large.zip)\nLARGE | reset | 0.7528 | [tapas_masklm_large_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_large_reset.zip)\nBASE | noreset | 0.7323 | [tapas_masklm_base.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_base.zip)\nBASE | reset | 0.7335 | [tapas_masklm_base_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_base_reset.zip)\nMEDIUM | noreset | 0.7059 | [tapas_masklm_medium.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_medium.zip)\nMEDIUM | reset | 0.7054 | [tapas_masklm_medium_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_medium_reset.zip)\nSMALL | noreset | 0.6818 | [tapas_masklm_small.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_small.zip)\nSMALL | reset | 0.6856 | [tapas_masklm_small_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_small_reset.zip)\nMINI | noreset | 0.6382 | [tapas_masklm_mini.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_mini.zip)\nMINI | reset | 0.6425 | [tapas_masklm_mini_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_mini_reset.zip)\nTINY | noreset | 0.4826 | [tapas_masklm_tiny.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_tiny.zip)\nTINY | reset | 0.5282 | [tapas_masklm_tiny_reset.zip](https://storage.googleapis.com/tapas_models/2020_08_05/tapas_masklm_tiny_reset.zip)\n\n### Original Models\n\nThe pre-trained TAPAS checkpoints can be downloaded here:\n\n* [MASKLM base](https://storage.googleapis.com/tapas_models/2020_04_21/tapas_base.zip)\n* [MASKLM large](https://storage.googleapis.com/tapas_models/2020_04_21/tapas_large.zip)\n* [SQA base](https://storage.googleapis.com/tapas_models/2020_04_21/tapas_sqa_base.zip)\n* [SQA large](https://storage.googleapis.com/tapas_models/2020_04_21/tapas_sqa_large.zip)\n\nThe first two models are pre-trained on the Mask-LM task and the last two\non the Mask-LM task first and SQA second.\n\n## Fine-Tuning Data\n\nYou also need to download the task data for the fine-tuning tasks:\n\n* [SQA](http://aka.ms/sqa)\n* [WikiSQL](https://github.com/salesforce/WikiSQL)\n* [WTQ 1.0](https://github.com/ppasupat/WikiTableQuestions)\n* [TabFact](https://github.com/wenhuchen/Table-Fact-Checking)\n\n## Pre-Training\n\nNote that you can skip pre-training and just use one of the pre-trained checkpoints provided above.\n\nInformation about the pre-taining data can be found [here](https://github.com/google-research/tapas/blob/master/PRETRAIN_DATA.md).\n\nThe TF examples for pre-training can be created using [Google Dataflow](https://cloud.google.com/dataflow):\n\n```bash\npython3 setup.py sdist\npython3 tapas/create_pretrain_examples_main.py \\\n  --input_file=\"gs://tapas_models/2020_05_11/interactions.txtpb.gz\" \\\n  --vocab_file=\"gs://tapas_models/2020_05_11/vocab.txt\" \\\n  --output_dir=\"gs://your_bucket/output\" \\\n  --runner_type=\"DATAFLOW\" \\\n  --gc_project=\"you-project\" \\\n  --gc_region=\"us-west1\" \\\n  --gc_job_name=\"create-pretrain\" \\\n  --gc_staging_location=\"gs://your_bucket/staging\" \\\n  --gc_temp_location=\"gs://your_bucket/tmp\" \\\n  --extra_packages=dist/tapas-0.0.1.dev0.tar.gz\n```\n\nYou can also run the pipeline locally but that will take a long time:\n\n```bash\npython3 tapas/create_pretrain_examples_main.py \\\n  --input_file=\"$data/interactions.txtpb.gz\" \\\n  --output_dir=\"$data/\" \\\n  --vocab_file=\"$data/vocab.txt\" \\\n  --runner_type=\"DIRECT\"\n```\n\nThis will create two tfrecord files for training and testing.\nThe pre-training can then be started with the command below.\nThe init checkpoint should be a standard BERT checkpoint.\n\n```bash\npython3 tapas/experiments/tapas_pretraining_experiment.py \\\n  --eval_batch_size=32 \\\n  --train_batch_size=512 \\\n  --tpu_iterations_per_loop=5000 \\\n  --num_eval_steps=100 \\\n  --save_checkpoints_steps=5000 \\\n  --num_train_examples=512000000 \\\n  --max_seq_length=128 \\\n  --input_file_train=\"${data}/train.tfrecord\" \\\n  --input_file_eval=\"${data}/test.tfrecord\" \\\n  --init_checkpoint=\"${tapas_data_dir}/model.ckpt\" \\\n  --bert_config_file=\"${tapas_data_dir}/bert_config.json\" \\\n  --model_dir=\"...\" \\\n  --compression_type=\"\" \\\n  --do_train\n```\n\nWhere **compression_type** should be set to **GZIP** if the tfrecords are compressed.\nYou can start a separate eval job by setting `--nodo_train --doeval`.\n\n## Running a fine-tuning task\n\nWe need to create the TF examples before starting the training.\nFor example, for SQA that would look like:\n\n```bash\npython3 tapas/run_task_main.py \\\n  --task=\"SQA\" \\\n  --input_dir=\"${sqa_data_dir}\" \\\n  --output_dir=\"${output_dir}\" \\\n  --bert_vocab_file=\"${tapas_data_dir}/vocab.txt\" \\\n  --mode=\"create_data\"\n```\n\nOptionally, to handle big tables, we can add a `--prune_columns` flag to\napply the **HEM** method described section 3.3 of our\n[paper](https://www.aclweb.org/anthology/2020.findings-emnlp.27/) to discard some columns based on\ntextual overlap with the sentence.\n\nAfterwards, training can be started by running:\n\n```bash\npython3 tapas/run_task_main.py \\\n  --task=\"SQA\" \\\n  --output_dir=\"${output_dir}\" \\\n  --init_checkpoint=\"${tapas_data_dir}/model.ckpt\" \\\n  --bert_config_file=\"${tapas_data_dir}/bert_config.json\" \\\n  --mode=\"train\" \\\n  --use_tpu\n```\n\nThis will use the preset hyper-parameters set in `hparam_utils.py`.\n\nIt's recommended to start a separate eval job to continuously produce predictions\nfor the checkpoints created by the training job. Alternatively, you can run\nthe eval job after training to only get the final results.\n\n```bash\npython3 tapas/run_task_main.py \\\n  --task=\"SQA\" \\\n  --output_dir=\"${output_dir}\" \\\n  --init_checkpoint=\"${tapas_data_dir}/model.ckpt\" \\\n  --bert_config_file=\"${tapas_data_dir}/bert_config.json\" \\\n  --mode=\"predict_and_evaluate\"\n```\n\nAnother tool to run experiments is ```tapas_classifier_experiment.py```. It's more\nflexible than ```run_task_main.py``` but also requires setting all the hyper-parameters\n(via the respective command line flags).\n\n\n## Evaluation\n\nHere we explain some details about different tasks.\n\n### SQA\n\nBy default, SQA will evaluate using the reference answers of the previous\nquestions. The number in [the paper](#how-to-cite-tapas) (Table 5) are computed\nusing the more realistic setup\nwhere the previous answer are model predictions. `run_task_main.py` will output\nadditional prediction files for this setup as well if run on GPU.\n\n### WTQ\n\nFor the official evaluation results one should convert the TAPAS predictions to\nthe WTQ format and run the official evaluation script. This can be done using\n`convert_predictions.py`.\n\n\n### WikiSQL\n\nAs discussed in [the paper](#how-to-cite-tapas) our code will compute evaluation\nmetrics that deviate from the official evaluation script (Table 3 and 10).\n\n\n## Hardware Requirements\n\nTAPAS is essentialy a BERT model and thus has the same [requirements](https://github.com/google-research/bert/blob/master/README.md#out-of-memory-issues).\nThis means that training the large model with 512 sequence length will\nrequire a TPU.\nYou can use the option `max_seq_length` to create shorter sequences. This will\nreduce accuracy but also make the model trainable on GPUs.\nAnother option is to reduce the batch size (`train_batch_size`),\nbut this will likely also affect accuracy.\nWe added an options `gradient_accumulation_steps` that allows you to split the\ngradient over multiple batches.\nEvaluation with the default test batch size (32) should be possible on GPU.\n\n## \u003ca name=\"how-to-cite-tapas\"\u003e\u003c/a\u003eHow to cite TAPAS?\n\nYou can cite the [ACL 2020 paper](https://www.aclweb.org/anthology/2020.acl-main.398/)\nand the [EMNLP 2020 Findings paper](https://www.aclweb.org/anthology/2020.findings-emnlp.27/) for the laters work on pre-training objectives.\n\n## Disclaimer\n\nThis is not an official Google product.\n\n## Contact information\n\nFor help or issues, please submit a GitHub issue.\n","funding_links":[],"categories":["机器阅读理解","Python"],"sub_categories":["其他_文本生成、文本对话"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Ftapas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Ftapas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Ftapas/lists"}